You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/01/28 22:42:46 UTC

[jira] Created: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias

relation-as-scalar - uses the last statement associated with the scalar alias
-----------------------------------------------------------------------------

                 Key: PIG-1834
                 URL: https://issues.apache.org/jira/browse/PIG-1834
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.0
            Reporter: Thejas M Nair
             Fix For: 0.9.0, 0.8.0


Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.

For example -
{code}
l = load 'x' as (a,b);
l = filter l by a > 1;
l = foreach ...
store l into  'y'
{code}

At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.

But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.

For example -
{code}
 l = load 'x' as (a,b);
 A = load 'x' as (a,b); 
 B = foreach A generate a, l.a as la;
 l = foreach l generate a+1 as a;
store B into 'b';
{code}

The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-16
Map Plan
l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
|
|---l: New For Each(false)[bag] - scope-7
    |   |
    |   Add[int] - scope-5
    |   |
    |   |---Cast[int] - scope-3
    |   |   |  
    |   |   |---Project[bytearray][0] - scope-2
    |   |
    |   |---Constant(1) - scope-4
    |
    |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
Global sort: false
----------------

MapReduce node scope-17
Map Plan
B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
|
|---B: New For Each(false,false)[bag] - scope-14
    |   |
    |   Project[bytearray][0] - scope-9
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
    |   |
    |   |---Constant(0) - scope-11
    |   |
    |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
    |
    |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1834:
--------------------------------

    Fix Version/s:     (was: 0.8.0)

> relation-as-scalar - uses the last statement associated with the scalar alias
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1834
>                 URL: https://issues.apache.org/jira/browse/PIG-1834
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>
> Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.
> For example -
> {code}
> l = load 'x' as (a,b);
> l = filter l by a > 1;
> l = foreach ...
> store l into  'y'
> {code}
> At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.
> But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.
> For example -
> {code}
>  l = load 'x' as (a,b);
>  A = load 'x' as (a,b); 
>  B = foreach A generate a, l.a as la;
>  l = foreach l generate a+1 as a;
> store B into 'b';
> {code}
> The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
> {code}
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-16
> Map Plan
> l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
> |
> |---l: New For Each(false)[bag] - scope-7
>     |   |
>     |   Add[int] - scope-5
>     |   |
>     |   |---Cast[int] - scope-3
>     |   |   |  
>     |   |   |---Project[bytearray][0] - scope-2
>     |   |
>     |   |---Constant(1) - scope-4
>     |
>     |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
> Global sort: false
> ----------------
> MapReduce node scope-17
> Map Plan
> B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
> |
> |---B: New For Each(false,false)[bag] - scope-14
>     |   |
>     |   Project[bytearray][0] - scope-9
>     |   |
>     |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
>     |   |
>     |   |---Constant(0) - scope-11
>     |   |
>     |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
>     |
>     |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1834:
-----------------------------------

    Assignee: Richard Ding

> relation-as-scalar - uses the last statement associated with the scalar alias
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1834
>                 URL: https://issues.apache.org/jira/browse/PIG-1834
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Richard Ding
>             Fix For: 0.8.0, 0.9.0
>
>
> Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.
> For example -
> {code}
> l = load 'x' as (a,b);
> l = filter l by a > 1;
> l = foreach ...
> store l into  'y'
> {code}
> At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.
> But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.
> For example -
> {code}
>  l = load 'x' as (a,b);
>  A = load 'x' as (a,b); 
>  B = foreach A generate a, l.a as la;
>  l = foreach l generate a+1 as a;
> store B into 'b';
> {code}
> The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
> {code}
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-16
> Map Plan
> l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
> |
> |---l: New For Each(false)[bag] - scope-7
>     |   |
>     |   Add[int] - scope-5
>     |   |
>     |   |---Cast[int] - scope-3
>     |   |   |  
>     |   |   |---Project[bytearray][0] - scope-2
>     |   |
>     |   |---Constant(1) - scope-4
>     |
>     |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
> Global sort: false
> ----------------
> MapReduce node scope-17
> Map Plan
> B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
> |
> |---B: New For Each(false,false)[bag] - scope-14
>     |   |
>     |   Project[bytearray][0] - scope-9
>     |   |
>     |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
>     |   |
>     |   |---Constant(0) - scope-11
>     |   |
>     |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
>     |
>     |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-1834.
---------------------------------

    Resolution: Fixed

> relation-as-scalar - uses the last statement associated with the scalar alias
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1834
>                 URL: https://issues.apache.org/jira/browse/PIG-1834
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>
> Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.
> For example -
> {code}
> l = load 'x' as (a,b);
> l = filter l by a > 1;
> l = foreach ...
> store l into  'y'
> {code}
> At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.
> But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.
> For example -
> {code}
>  l = load 'x' as (a,b);
>  A = load 'x' as (a,b); 
>  B = foreach A generate a, l.a as la;
>  l = foreach l generate a+1 as a;
> store B into 'b';
> {code}
> The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
> {code}
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-16
> Map Plan
> l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
> |
> |---l: New For Each(false)[bag] - scope-7
>     |   |
>     |   Add[int] - scope-5
>     |   |
>     |   |---Cast[int] - scope-3
>     |   |   |  
>     |   |   |---Project[bytearray][0] - scope-2
>     |   |
>     |   |---Constant(1) - scope-4
>     |
>     |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
> Global sort: false
> ----------------
> MapReduce node scope-17
> Map Plan
> B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
> |
> |---B: New For Each(false,false)[bag] - scope-14
>     |   |
>     |   Project[bytearray][0] - scope-9
>     |   |
>     |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
>     |   |
>     |   |---Constant(0) - scope-11
>     |   |
>     |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
>     |
>     |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002836#comment-13002836 ] 

Richard Ding commented on PIG-1834:
-----------------------------------

This is fixed with the new parser changes.

> relation-as-scalar - uses the last statement associated with the scalar alias
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1834
>                 URL: https://issues.apache.org/jira/browse/PIG-1834
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>
> Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.
> For example -
> {code}
> l = load 'x' as (a,b);
> l = filter l by a > 1;
> l = foreach ...
> store l into  'y'
> {code}
> At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.
> But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.
> For example -
> {code}
>  l = load 'x' as (a,b);
>  A = load 'x' as (a,b); 
>  B = foreach A generate a, l.a as la;
>  l = foreach l generate a+1 as a;
> store B into 'b';
> {code}
> The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
> {code}
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-16
> Map Plan
> l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
> |
> |---l: New For Each(false)[bag] - scope-7
>     |   |
>     |   Add[int] - scope-5
>     |   |
>     |   |---Cast[int] - scope-3
>     |   |   |  
>     |   |   |---Project[bytearray][0] - scope-2
>     |   |
>     |   |---Constant(1) - scope-4
>     |
>     |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
> Global sort: false
> ----------------
> MapReduce node scope-17
> Map Plan
> B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
> |
> |---B: New For Each(false,false)[bag] - scope-14
>     |   |
>     |   Project[bytearray][0] - scope-9
>     |   |
>     |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
>     |   |
>     |   |---Constant(0) - scope-11
>     |   |
>     |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
>     |
>     |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PIG-1834) relation-as-scalar - uses the last statement associated with the scalar alias

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1834:
-------------------------------

    Description: 
Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.

For example -
{code}
l = load 'x' as (a,b);
l = filter l by a > 1;
l = foreach ...
store l into  'y'
{code}

At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.

But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.

For example -
{code}
 l = load 'x' as (a,b);
 A = load 'x' as (a,b); 
 B = foreach A generate a, l.a as la;
 l = foreach l generate a+1 as a;
store B into 'b';
{code}

The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
{code}

#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-16
Map Plan
l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
|
|---l: New For Each(false)[bag] - scope-7
    |   |
    |   Add[int] - scope-5
    |   |
    |   |---Cast[int] - scope-3
    |   |   |  
    |   |   |---Project[bytearray][0] - scope-2
    |   |
    |   |---Constant(1) - scope-4
    |
    |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
Global sort: false
----------------

MapReduce node scope-17
Map Plan
B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
|
|---B: New For Each(false,false)[bag] - scope-14
    |   |
    |   Project[bytearray][0] - scope-9
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
    |   |
    |   |---Constant(0) - scope-11
    |   |
    |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
    |
    |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------
{code}



  was:
Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.

For example -
{code}
l = load 'x' as (a,b);
l = filter l by a > 1;
l = foreach ...
store l into  'y'
{code}

At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.

But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.

For example -
{code}
 l = load 'x' as (a,b);
 A = load 'x' as (a,b); 
 B = foreach A generate a, l.a as la;
 l = foreach l generate a+1 as a;
store B into 'b';
{code}

The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-16
Map Plan
l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
|
|---l: New For Each(false)[bag] - scope-7
    |   |
    |   Add[int] - scope-5
    |   |
    |   |---Cast[int] - scope-3
    |   |   |  
    |   |   |---Project[bytearray][0] - scope-2
    |   |
    |   |---Constant(1) - scope-4
    |
    |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
Global sort: false
----------------

MapReduce node scope-17
Map Plan
B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
|
|---B: New For Each(false,false)[bag] - scope-14
    |   |
    |   Project[bytearray][0] - scope-9
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
    |   |
    |   |---Constant(0) - scope-11
    |   |
    |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
    |
    |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------




> relation-as-scalar - uses the last statement associated with the scalar alias
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1834
>                 URL: https://issues.apache.org/jira/browse/PIG-1834
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>             Fix For: 0.8.0, 0.9.0
>
>
> Pig allows relation alias to be re-used , ie refer to different relations(/statements) . I have not seen this in documentation, but I have seen people writing such queries.
> For example -
> {code}
> l = load 'x' as (a,b);
> l = filter l by a > 1;
> l = foreach ...
> store l into  'y'
> {code}
> At any part of the query, the alias "l' always represents the relation it last associated with the portion of pig-query above it.
> But in case of relation-as-scalar feature the association is happening with the last relation associated with the alias in entire script.
> For example -
> {code}
>  l = load 'x' as (a,b);
>  A = load 'x' as (a,b); 
>  B = foreach A generate a, l.a as la;
>  l = foreach l generate a+1 as a;
> store B into 'b';
> {code}
> The alias l in relation with alias B should refer to the load, but it refers to the foreach statement -
> {code}
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-16
> Map Plan
> l: Store(file:/tmp/temp-953430379/tmp2006282146:org.apache.pig.impl.io.InterStorage) - scope-8
> |
> |---l: New For Each(false)[bag] - scope-7
>     |   |
>     |   Add[int] - scope-5
>     |   |
>     |   |---Cast[int] - scope-3
>     |   |   |  
>     |   |   |---Project[bytearray][0] - scope-2
>     |   |
>     |   |---Constant(1) - scope-4
>     |
>     |---l: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-1--------
> Global sort: false
> ----------------
> MapReduce node scope-17
> Map Plan
> B: Store(file:///Users/tejas/pig_type/trunk/b:org.apache.pig.builtin.PigStorage) - scope-15
> |
> |---B: New For Each(false,false)[bag] - scope-14
>     |   |
>     |   Project[bytearray][0] - scope-9
>     |   |
>     |   POUserFunc(org.apache.pig.impl.builtin.ReadScalars)[int] - scope-13
>     |   |
>     |   |---Constant(0) - scope-11
>     |   |
>     |   |---Constant(file:/tmp/temp-953430379/tmp2006282146) - scope-12
>     |
>     |---A: Load(file:///Users/tejas/pig_type/trunk/x:org.apache.pig.builtin.PigStorage) - scope-0--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.