You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Arthur Zwiegincew (JIRA)" <ji...@apache.org> on 2008/08/22 00:02:44 UTC
[jira] Created: (PIG-390) Union doesn't work
Union doesn't work
------------------
Key: PIG-390
URL: https://issues.apache.org/jira/browse/PIG-390
Project: Pig
Issue Type: Bug
Environment: Mac OS X
Reporter: Arthur Zwiegincew
data files:
$ cat ~/tmp/data
1 1
2 1
3 10
$ cat ~/tmp/data-2
4 20
5 20
pig script:
data = load '/Users/arthur/tmp/data' as (x, y);
data2 = load '/Users/arthur/tmp/data-2' as (x, y);
both = union data, data2;
dump both;
result:
(4, 20)
(5, 20)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-390) Union doesn't work
Posted by "Arthur Zwiegincew (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627261#action_12627261 ]
Arthur Zwiegincew commented on PIG-390:
---------------------------------------
Here's a workaround I'm using:
package com.cooliris.analytics;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;
/**
* Implements a UNIONALL Pig function. It accepts a tuple of the format <unused, {bag-1}, {bag-2}, {bag-3}, ...>
* and outputs a set of tuples corresponding to UNION bag-1, bag-2, bag-3, ... . This is intended as a workaround
* to bug PIG-390 — Union doesn't work.
*
* Instead of:
* combined = UNION data1, data2, data3;
* ...do the following:
* cg_combined = COGROUP data1 BY 1, data2 BY 1, data3 BY 1;
* combined = FOREACH cg_combined GENERATE FLATTEN(com.cooliris.analytics.UNIONALL(*));
*
* @author arthur@cooliris.com
*/
public class UNIONALL extends EvalFunc<DataBag> {
@Override
public void exec(Tuple input, DataBag output) throws IOException {
for (int i = 1; i < input.arity(); ++i) {
for (Tuple nested : input.getBagField(i)) {
output.add(nested);
}
}
}
}
> Union doesn't work
> ------------------
>
> Key: PIG-390
> URL: https://issues.apache.org/jira/browse/PIG-390
> Project: Pig
> Issue Type: Bug
> Environment: Mac OS X
> Reporter: Arthur Zwiegincew
>
> data files:
> $ cat ~/tmp/data
> 1 1
> 2 1
> 3 10
> $ cat ~/tmp/data-2
> 4 20
> 5 20
> pig script:
> data = load '/Users/arthur/tmp/data' as (x, y);
> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union data, data2;
> dump both;
> result:
> (4, 20)
> (5, 20)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-390) Union doesn't work
Posted by "Kevin Weil (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636975#action_12636975 ]
Kevin Weil commented on PIG-390:
--------------------------------
An update to this bug: it appears to be fixed in the types branch, using hadoop 0.18.1 and the stable-1 svn tag.
Kevin
> Union doesn't work
> ------------------
>
> Key: PIG-390
> URL: https://issues.apache.org/jira/browse/PIG-390
> Project: Pig
> Issue Type: Bug
> Environment: Mac OS X
> Reporter: Arthur Zwiegincew
>
> data files:
> $ cat ~/tmp/data
> 1 1
> 2 1
> 3 10
> $ cat ~/tmp/data-2
> 4 20
> 5 20
> pig script:
> data = load '/Users/arthur/tmp/data' as (x, y);
> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union data, data2;
> dump both;
> result:
> (4, 20)
> (5, 20)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.