You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2015/04/24 23:23:39 UTC
[jira] [Commented] (PIG-4511) Add columns to prune from PluckTuple
[ https://issues.apache.org/jira/browse/PIG-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511777#comment-14511777 ]
Cheolsoo Park commented on PIG-4511:
------------------------------------
[~daijy], actually I think we shouldn't commit this patch as is.
Three problems:
1) It container unwanted changes such as-
{code}
+ *
+ * Additional arguments to this udf are columns to exclude from the relation matching this prefix (assuming this column is the end of the alias: e.g., if choose to exclude column y then exclude a::b::y using PluckTuple('a::','y'))
*
* Example:
- *
- * 1) Prefix
* a = load 'a' as (x, y);
* b = load 'b' as (x, y);
* c = join a by x, b by x;
@@ -47,29 +47,28 @@ import com.google.common.collect.Lists;
* c: {a::x: bytearray,a::y: bytearray,b::x: bytearray,b::y: bytearray}
* describe d;
* d: {plucked::a::x: bytearray,plucked::a::y: bytearray}
- *
- * 2) Regex
- * a = load 'a' as (x, y);
- * b = load 'b' as (x, y);
- * c = join a by x, b by x;
- * DEFINE pluck PluckTuple('.*::y');
- * d = foreach c generate FLATTEN(pluck(*));
- * describe c;
- * c: {a::x: bytearray,a::y: bytearray,b::x: bytearray,b::y: bytearray}
- * describe d;
- * d: {plucked::a::y: bytearray,plucked::a::y: bytearray}
*/
public class PluckTuple extends EvalFunc<Tuple> {
private static final TupleFactory mTupleFactory = TupleFactory.getInstance();
- private static Pattern pattern;
+ private static Pattern prefixPattern;
{code}
What happened is that he generated his patch based to my internal release branch where I committed {{PIG-4401-2.patch}} while {{PIG-4401-3.patch}} was committed to Apache trunk.
2) His patch is missing the update to docs.
3) Won't the following change have an impact to Tez local mode?
{code}
- pigServer = new PigServer(Util.getLocalTestMode());
+ pigServer = new PigServer(ExecType.LOCAL);
{code}
Actually, I was communicating with Joseph (we work for the same employer) to update his patch. If [~jbabcock] is busy, I can put up a new patch that addresses these issues.
> Add columns to prune from PluckTuple
> ------------------------------------
>
> Key: PIG-4511
> URL: https://issues.apache.org/jira/browse/PIG-4511
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.14.0
> Reporter: Joseph Babcock
> Assignee: Joseph Babcock
> Priority: Minor
> Fix For: 0.15.0
>
> Attachments: pluckTuple.patch
>
>
> Currently pluckTuple returns all columns in relation that match a prefix predicate. This patch allows a variable argument list of column names following the predicate to remove from the alias.
> Example:
> a = load 'a' as (x:int,y:chararray,z:long)
> b = load 'b' as (x:int,y:chararray,z:long)
> c = join a by x, b by x;
> Define pluck PluckTuple('a::','x',z')
> -- returns y from a only
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)