You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Utkarsh Srivastava (JIRA)" <ji...@apache.org> on 2007/12/13 06:17:43 UTC

[jira] Created: (PIG-51) Combiner gives wrong result in the presence of flattening

Combiner gives wrong result in the presence of flattening
---------------------------------------------------------

                 Key: PIG-51
                 URL: https://issues.apache.org/jira/browse/PIG-51
             Project: Pig
          Issue Type: Bug
            Reporter: Utkarsh Srivastava
            Priority: Critical


If you do something like

a = load ... as (f1,f2,f3);
b = group a by (f1,f2);
c = foreach b generate flatten(group), SUM(a.f3);

The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-51) Combiner gives wrong result in the presence of flattening

Posted by "Utkarsh Srivastava (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552022 ] 

Utkarsh Srivastava commented on PIG-51:
---------------------------------------

Seems there is an empty tuple in your data set (due to which id  
cannot be resolved). Thats what is throwing the exception. In fact  
the combiner doesn't trigger in your query.

Is there a reason why you do one bit of filter outside, and one  
inside the foreach. Couldn't both the filters be done before  
grouping. It would be more efficient that way, plus the combiner will  
probably kick in.

Utkarsh





> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-51) Combiner gives wrong result in the presence of flattening

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551565 ] 

Alan Gates commented on PIG-51:
-------------------------------

+1, patch looks good.

> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-51) Combiner gives wrong result in the presence of flattening

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551935 ] 

Ted Dunning commented on PIG-51:
--------------------------------


I think I am still seeing this issue or a cousin even after applying this patch.  I don't know enough to be sure, however.

grunt> ls /logs/search/2007/12/10
/logs/search/2007/12/10/part-00000<r 3>	1313515859
/logs/search/2007/12/10/part-00001<r 3>	1313535390
/logs/search/2007/12/10/part-00002<r 3>	1313485045
/logs/search/2007/12/10/part-00003<r 3>	1313536061
grunt>  a = load '/logs/search/2007/12/10' as (eventType, date, month,
week, day, hour, id, videoId, VisitorUID, engineName, query, offset);
 b = filter a by (id neq '-');

grunt>  b = filter a by (id neq '-');
grunt>  c = group b by id;
grunt>  describe c
c: (group, b: (eventType, date, month, week, day, hour, id, videoId, VisitorUID, engineName, query, offset ) )
grunt>  d = foreach c {
 click = filter b by eventType eq '/search/click';
 generate COUNT(click);
 }
>> >> >> grunt>  describe d
d: (count1 )
grunt>  e = group d by 1;
grunt>  describe e
e: (group: ( ), d: (count1 ) )
grunt>  f = foreach e generate COUNT(*), SUM(d.count1);
grunt> dump f

----- MapReduce Job -----
Input: [/logs/search/2007/12/10:org.apache.pig.builtin.PigStorage()]
Map: [[*]->[FILTER BY ([PROJECT $6] neq ['-'])]]
Group: [GENERATE {[PROJECT $6],[*]}]
Combine: null
Reduce: GENERATE {[COUNT(GENERATE {[PROJECT $1]->[FILTER BY ([PROJECT $0] eq ['/search/click'])]})]}
Output: /tmp/temp1435257199/tmp1109313480:org.apache.pig.builtin.BinStorage
Split: null
Map parallelism: -1
Reduce parallelism: -1
Job jar size = 482135
2007-12-14 12:04:44,776 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:04:57,832 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:04:59,841 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:05:01,849 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:05:03,857 [main] INFO  org.apache.pig - Pig progress = 0%
2007-12-14 12:05:05,865 [main] INFO  org.apache.pig - Pig progress = 1%
2007-12-14 12:05:07,873 [main] INFO  org.apache.pig - Pig progress = 1%
2007-12-14 12:05:09,881 [main] INFO  org.apache.pig - Pig progress = 2%
2007-12-14 12:05:11,889 [main] INFO  org.apache.pig - Pig progress = 2%
2007-12-14 12:05:13,897 [main] INFO  org.apache.pig - Pig progress = 2%
2007-12-14 12:05:15,905 [main] INFO  org.apache.pig - Pig progress = 3%
2007-12-14 12:05:17,913 [main] INFO  org.apache.pig - Pig progress = 3%
2007-12-14 12:05:21,929 [main] INFO  org.apache.pig - Pig progress = 3%
2007-12-14 12:05:23,937 [main] INFO  org.apache.pig - Pig progress = 4%
2007-12-14 12:05:25,945 [main] INFO  org.apache.pig - Pig progress = 4%
2007-12-14 12:05:27,953 [main] INFO  org.apache.pig - Pig progress = 4%
2007-12-14 12:05:29,961 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:31,969 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:33,977 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:37,993 [main] INFO  org.apache.pig - Pig progress = 5%
2007-12-14 12:05:40,001 [main] INFO  org.apache.pig - Pig progress = 6%
2007-12-14 12:05:42,009 [main] INFO  org.apache.pig - Pig progress = 6%
2007-12-14 12:05:44,016 [main] INFO  org.apache.pig - Pig progress = 6%
2007-12-14 12:05:46,024 [main] INFO  org.apache.pig - Pig progress = 7%
2007-12-14 12:05:48,032 [main] INFO  org.apache.pig - Pig progress = 7%
2007-12-14 12:05:50,040 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:52,051 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:54,060 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:56,068 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:05:58,077 [main] INFO  org.apache.pig - Pig progress = 8%
2007-12-14 12:06:00,085 [main] INFO  org.apache.pig - Pig progress = 9%
2007-12-14 12:06:02,092 [main] INFO  org.apache.pig - Pig progress = 9%
2007-12-14 12:06:04,100 [main] INFO  org.apache.pig - Pig progress = 9%
2007-12-14 12:06:08,116 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:10,124 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:12,133 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:18,160 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:20,168 [main] INFO  org.apache.pig - Pig progress = 10%
2007-12-14 12:06:22,176 [main] INFO  org.apache.pig - Pig progress = 11%
2007-12-14 12:06:24,184 [main] INFO  org.apache.pig - Pig progress = 11%
2007-12-14 12:06:26,192 [main] INFO  org.apache.pig - Pig progress = 11%
2007-12-14 12:06:28,201 [main] INFO  org.apache.pig - Pig progress = 12%
2007-12-14 12:06:30,208 [main] INFO  org.apache.pig - Pig progress = 12%
2007-12-14 12:06:32,216 [main] INFO  org.apache.pig - Pig progress = 12%
2007-12-14 12:06:34,224 [main] INFO  org.apache.pig - Pig progress = 13%
2007-12-14 12:06:36,232 [main] INFO  org.apache.pig - Pig progress = 13%
2007-12-14 12:06:38,240 [main] INFO  org.apache.pig - Pig progress = 13%
2007-12-14 12:06:40,251 [main] INFO  org.apache.pig - Pig progress = 14%
2007-12-14 12:06:42,260 [main] INFO  org.apache.pig - Pig progress = 14%
2007-12-14 12:06:44,268 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:46,276 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:48,285 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:50,292 [main] INFO  org.apache.pig - Pig progress = 15%
2007-12-14 12:06:52,300 [main] INFO  org.apache.pig - Pig progress = 16%
2007-12-14 12:06:56,316 [main] INFO  org.apache.pig - Pig progress = 16%
2007-12-14 12:06:58,324 [main] INFO  org.apache.pig - Pig progress = 17%
2007-12-14 12:07:00,332 [main] INFO  org.apache.pig - Pig progress = 17%
2007-12-14 12:07:02,340 [main] INFO  org.apache.pig - Pig progress = 17%
2007-12-14 12:07:04,348 [main] INFO  org.apache.pig - Pig progress = 17%
...
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000071 java.lang.RuntimeException: java.io.IOException: Column number out of range: 6 -- (                 )
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:95)
	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
	at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
	at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
	at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:58)
	at org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Caused by: java.io.IOException: Column number out of range: 6 -- (                 )
	at org.apache.pig.data.Tuple.getField(Tuple.java:147)
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:85)
	... 7 more

2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000072
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000073
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000074
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000075 java.lang.RuntimeException: java.io.IOException: Column number out of range: 6 -- (full, 50)
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:95)
	at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
	at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
	at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
	at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:58)
	at org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Caused by: java.io.IOException: Column number out of range: 6 -- (full, 50)
	at org.apache.pig.data.Tuple.getField(Tuple.java:147)
	at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:85)
	... 7 more

2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000076
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task (map) tip_200712121227_0004_m_000079
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000000
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000001
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000002
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task (reduce) tip_200712121227_0004_r_000003
Job failed
grunt> 

> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-51) Combiner gives wrong result in the presence of flattening

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates resolved PIG-51.
---------------------------

       Resolution: Fixed
    Fix Version/s: 0.1.0

Fix for this was checked in some time ago (12/13/07).

> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>             Fix For: 0.1.0
>
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-51) Combiner gives wrong result in the presence of flattening

Posted by "Utkarsh Srivastava (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Utkarsh Srivastava updated PIG-51:
----------------------------------

    Attachment: combiner-flatten.patch

> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-51) Combiner gives wrong result in the presence of flattening

Posted by "Utkarsh Srivastava (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Utkarsh Srivastava updated PIG-51:
----------------------------------

    Patch Info: [Patch Available]

> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
>                 Key: PIG-51
>                 URL: https://issues.apache.org/jira/browse/PIG-51
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been flattened yet. But if the combiner kicks in, it already flattens the group, leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.