You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2008/07/15 02:21:31 UTC

[jira] Created: (PIG-313) Error handling aggregate of a computation

Error handling aggregate of a computation
-----------------------------------------

                 Key: PIG-313
                 URL: https://issues.apache.org/jira/browse/PIG-313
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch


Query which fails:

{code}
a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age*a.gpa);                            
store c into ':OUTPATH:';\,
{code}

Error output:
{quote}
2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-313) Error handling aggregate of a computation

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613933#action_12613933 ] 

Pi Song commented on PIG-313:
-----------------------------

I think we have discussed about this before and the conclusion is we don't support this.

Consider this query:-
{noformat}
b = cogroup a1 by name, a2 by name;
c = foreach b generate group, SUM(a1.age*a2.gpa);                            
store c into ':OUTPATH:';\,
{noformat}

This will make it difficult for us because a1.age gives a bag and a2.gpa also gives a bag.
What is the definition of bag multiplied by bag?

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-313) Error handling aggregate of a computation

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614097#action_12614097 ] 

Pradeep Kamath commented on PIG-313:
------------------------------------

Per http://wiki.apache.org/pig/PigTypesFunctionalSpec - in the last section: "Argument Construction for Functions" - it says that the computation will be done the fields per tuple in the group and the computed results will be stored into a bag and then supplied to SUM - Is this not going to be the case in this new Pig types release - if not the wiki should be updated.

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-313) Error handling aggregate of a computation

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616197#action_12616197 ] 

Pradeep Kamath commented on PIG-313:
------------------------------------

Another case of this issue is the following:

{code}
a = load 'singlefile/studenttab10k' as (name, age, gpa);
b = group a ALL;
c = foreach b generate SUM((int)(a.age)), MIN((int)(a.age)), MAX((int)(a.age)), AVG((int)(a.age)), MIN((chararray)(a.name)), MAX((chararray)(a.name)), SUM((double)(a.gpa)), MIN((double)(a.gpa)), MAX((double)(a.gpa)), AVG((double)(a.gpa));
store c into 'outdir';
{code}
In this case, the cast fails since it is trying to cast a bag of bytearray to int. However it should really cast each bytearray to int and then supply the bag of ints to SUM() etc.

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-313) Error handling aggregate of a computation

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-313:
-------------------------------

    Priority: Minor  (was: Major)

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Priority: Minor
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-313) Error handling aggregate of a computation

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pi Song resolved PIG-313.
-------------------------

    Resolution: Won't Fix

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (PIG-313) Error handling aggregate of a computation

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pi Song reopened PIG-313:
-------------------------


That sounds right. Then this is a problem in parser.

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-313) Error handling aggregate of a computation

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628523#action_12628523 ] 

Olga Natkovich commented on PIG-313:
------------------------------------

Pi is correct - we do not support this right now. One idea we considered for future work is to define + operator on bags to match SQL semantics. Other approaches are also possible.

> Error handling aggregate of a computation
> -----------------------------------------
>
>                 Key: PIG-313
>                 URL: https://issues.apache.org/jira/browse/PIG-313
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>             Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);                            
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop 
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.