You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Arthur Zwiegincew (JIRA)" <ji...@apache.org> on 2008/08/22 00:02:44 UTC

[jira] Created: (PIG-390) Union doesn't work

Union doesn't work
------------------

                 Key: PIG-390
                 URL: https://issues.apache.org/jira/browse/PIG-390
             Project: Pig
          Issue Type: Bug
         Environment: Mac OS X
            Reporter: Arthur Zwiegincew


data files:

$ cat ~/tmp/data
1	1
2	1
3	10

$ cat ~/tmp/data-2
4	20
5	20

pig script:
data = load '/Users/arthur/tmp/data' as (x, y);
data2 = load '/Users/arthur/tmp/data-2' as (x, y);
both = union data, data2;
dump both;

result:
(4, 20)
(5, 20)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-390) Union doesn't work

Posted by "Arthur Zwiegincew (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627261#action_12627261 ] 

Arthur Zwiegincew commented on PIG-390:
---------------------------------------

Here's a workaround I'm using:

package com.cooliris.analytics;

import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;

/**
 * Implements a UNIONALL Pig function. It accepts a tuple of the format <unused, {bag-1}, {bag-2}, {bag-3}, ...>
 * and outputs a set of tuples corresponding to UNION bag-1, bag-2, bag-3, ... . This is intended as a workaround
 * to bug PIG-390 — Union doesn't work.
 * 
 * Instead of:
 *   combined = UNION data1, data2, data3;
 * ...do the following:
 *   cg_combined = COGROUP data1 BY 1, data2 BY 1, data3 BY 1;
 *   combined = FOREACH cg_combined GENERATE FLATTEN(com.cooliris.analytics.UNIONALL(*));
 * 
 * @author arthur@cooliris.com
 */
public class UNIONALL extends EvalFunc<DataBag> {

    @Override
    public void exec(Tuple input, DataBag output) throws IOException {
        for (int i = 1; i < input.arity(); ++i) {
            for (Tuple nested : input.getBagField(i)) {
                output.add(nested);
            }
        }
    }
}


> Union doesn't work
> ------------------
>
>                 Key: PIG-390
>                 URL: https://issues.apache.org/jira/browse/PIG-390
>             Project: Pig
>          Issue Type: Bug
>         Environment: Mac OS X
>            Reporter: Arthur Zwiegincew
>
> data files:
> $ cat ~/tmp/data
> 1	1
> 2	1
> 3	10
> $ cat ~/tmp/data-2
> 4	20
> 5	20
> pig script:
> data = load '/Users/arthur/tmp/data' as (x, y);
> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union data, data2;
> dump both;
> result:
> (4, 20)
> (5, 20)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-390) Union doesn't work

Posted by "Kevin Weil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636975#action_12636975 ] 

Kevin Weil commented on PIG-390:
--------------------------------

An update to this bug: it appears to be fixed in the types branch, using hadoop 0.18.1 and the stable-1 svn tag.

Kevin

> Union doesn't work
> ------------------
>
>                 Key: PIG-390
>                 URL: https://issues.apache.org/jira/browse/PIG-390
>             Project: Pig
>          Issue Type: Bug
>         Environment: Mac OS X
>            Reporter: Arthur Zwiegincew
>
> data files:
> $ cat ~/tmp/data
> 1	1
> 2	1
> 3	10
> $ cat ~/tmp/data-2
> 4	20
> 5	20
> pig script:
> data = load '/Users/arthur/tmp/data' as (x, y);
> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union data, data2;
> dump both;
> result:
> (4, 20)
> (5, 20)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.