You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2008/07/15 02:21:31 UTC
[jira] Created: (PIG-313) Error handling aggregate of a computation
Error handling aggregate of a computation
-----------------------------------------
Key: PIG-313
URL: https://issues.apache.org/jira/browse/PIG-313
Project: Pig
Issue Type: Bug
Affects Versions: types_branch
Reporter: Pradeep Kamath
Fix For: types_branch
Query which fails:
{code}
a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age*a.gpa);
store c into ':OUTPATH:';\,
{code}
Error output:
{quote}
2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
{quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-313) Error handling aggregate of a
computation
Posted by "Pi Song (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613933#action_12613933 ]
Pi Song commented on PIG-313:
-----------------------------
I think we have discussed about this before and the conclusion is we don't support this.
Consider this query:-
{noformat}
b = cogroup a1 by name, a2 by name;
c = foreach b generate group, SUM(a1.age*a2.gpa);
store c into ':OUTPATH:';\,
{noformat}
This will make it difficult for us because a1.age gives a bag and a2.gpa also gives a bag.
What is the definition of bag multiplied by bag?
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-313) Error handling aggregate of a
computation
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614097#action_12614097 ]
Pradeep Kamath commented on PIG-313:
------------------------------------
Per http://wiki.apache.org/pig/PigTypesFunctionalSpec - in the last section: "Argument Construction for Functions" - it says that the computation will be done the fields per tuple in the group and the computed results will be stored into a bag and then supplied to SUM - Is this not going to be the case in this new Pig types release - if not the wiki should be updated.
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-313) Error handling aggregate of a
computation
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616197#action_12616197 ]
Pradeep Kamath commented on PIG-313:
------------------------------------
Another case of this issue is the following:
{code}
a = load 'singlefile/studenttab10k' as (name, age, gpa);
b = group a ALL;
c = foreach b generate SUM((int)(a.age)), MIN((int)(a.age)), MAX((int)(a.age)), AVG((int)(a.age)), MIN((chararray)(a.name)), MAX((chararray)(a.name)), SUM((double)(a.gpa)), MIN((double)(a.gpa)), MAX((double)(a.gpa)), AVG((double)(a.gpa));
store c into 'outdir';
{code}
In this case, the cast fails since it is trying to cast a bag of bytearray to int. However it should really cast each bytearray to int and then supply the bag of ints to SUM() etc.
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-313) Error handling aggregate of a computation
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-313:
-------------------------------
Priority: Minor (was: Major)
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Priority: Minor
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-313) Error handling aggregate of a
computation
Posted by "Pi Song (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pi Song resolved PIG-313.
-------------------------
Resolution: Won't Fix
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-313) Error handling aggregate of a
computation
Posted by "Pi Song (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pi Song reopened PIG-313:
-------------------------
That sounds right. Then this is a problem in parser.
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-313) Error handling aggregate of a
computation
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628523#action_12628523 ]
Olga Natkovich commented on PIG-313:
------------------------------------
Pi is correct - we do not support this right now. One idea we considered for future work is to define + operator on bags to match SQL semantics. Other approaches are also possible.
> Error handling aggregate of a computation
> -----------------------------------------
>
> Key: PIG-313
> URL: https://issues.apache.org/jira/browse/PIG-313
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Fix For: types_branch
>
>
> Query which fails:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age*a.gpa);
> store c into ':OUTPATH:';\,
> {code}
> Error output:
> {quote}
> 2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
> 2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
> 2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
> 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
> 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.