You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2015/04/24 23:23:39 UTC

[jira] [Commented] (PIG-4511) Add columns to prune from PluckTuple

    [ https://issues.apache.org/jira/browse/PIG-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511777#comment-14511777 ] 

Cheolsoo Park commented on PIG-4511:
------------------------------------

[~daijy], actually I think we shouldn't commit this patch as is.

Three problems:

1) It container unwanted changes such as-
{code}
+ * 
+ * Additional arguments to this udf are columns to exclude from the relation matching this prefix (assuming this column is the end of the alias: e.g., if choose to exclude column y then exclude a::b::y using PluckTuple('a::','y'))
  *
  * Example:
- *
- * 1) Prefix
  * a = load 'a' as (x, y);
  * b = load 'b' as (x, y);
  * c = join a by x, b by x;
@@ -47,29 +47,28 @@ import com.google.common.collect.Lists;
  * c: {a::x: bytearray,a::y: bytearray,b::x: bytearray,b::y: bytearray}
  * describe d;
  * d: {plucked::a::x: bytearray,plucked::a::y: bytearray}
- *
- * 2) Regex
- * a = load 'a' as (x, y);
- * b = load 'b' as (x, y);
- * c = join a by x, b by x;
- * DEFINE pluck PluckTuple('.*::y');
- * d = foreach c generate FLATTEN(pluck(*));
- * describe c;
- * c: {a::x: bytearray,a::y: bytearray,b::x: bytearray,b::y: bytearray}
- * describe d;
- * d: {plucked::a::y: bytearray,plucked::a::y: bytearray}
  */
 public class PluckTuple extends EvalFunc<Tuple> {
     private static final TupleFactory mTupleFactory = TupleFactory.getInstance();
-    private static Pattern pattern;
+    private static Pattern prefixPattern;
{code}
What happened is that he generated his patch based to my internal release branch where I committed {{PIG-4401-2.patch}} while {{PIG-4401-3.patch}} was committed to Apache trunk.

2) His patch is missing the update to docs.

3) Won't the following change have an impact to Tez local mode?
{code}
-        pigServer = new PigServer(Util.getLocalTestMode());
+        pigServer = new PigServer(ExecType.LOCAL);
{code}

Actually, I was communicating with Joseph (we work for the same employer) to update his patch. If [~jbabcock] is busy, I can put up a new patch that addresses these issues.

> Add columns to prune from PluckTuple
> ------------------------------------
>
>                 Key: PIG-4511
>                 URL: https://issues.apache.org/jira/browse/PIG-4511
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.14.0
>            Reporter: Joseph Babcock
>            Assignee: Joseph Babcock
>            Priority: Minor
>             Fix For: 0.15.0
>
>         Attachments: pluckTuple.patch
>
>
> Currently pluckTuple returns all columns in relation that match a prefix predicate. This patch allows a variable argument list of column names following the predicate to remove from the alias. 
> Example:
> a = load 'a' as (x:int,y:chararray,z:long)
> b = load 'b' as (x:int,y:chararray,z:long)
> c = join a by x, b by x;
> Define pluck PluckTuple('a::','x',z')
> -- returns y from a only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)