You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Russell Jurney (JIRA)" <ji...@apache.org> on 2010/06/30 07:46:49 UTC

[jira] Created: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
-------------------------------------------------------------------------------------------

                 Key: PIG-1476
                 URL: https://issues.apache.org/jira/browse/PIG-1476
             Project: Pig
          Issue Type: New Feature
    Affects Versions: 0.7.0
         Environment: sunny, 60% humidity with a chance of rain.
            Reporter: Russell Jurney
             Fix For: 0.8.0


After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

> DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  Choice wan:

> foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:

> foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884079#action_12884079 ] 

Thejas M Nair commented on PIG-1476:
------------------------------------

You are right that this sort of naming is there to deal with duplicate fields.

As long as the portion after :: is unique , you can ignore the f1:: or f2:: part .
For example, this will work -
{code}
describe join_alias;
 {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
f = foreach genereate f1, f2;
describe f;
 {other_thing::f1:int, other_thing::f2:chararray}
{code}


Since that works , you can also do -
{code}
f = foreach genereate f1 as f1, f2 as f2; -- yes, this does look strange !
describe f;
{f1:int, f2:chararray}
{code}




> Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
> -------------------------------------------------------------------------------------------
>
>                 Key: PIG-1476
>                 URL: https://issues.apache.org/jira/browse/PIG-1476
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: sunny, 60% humidity with a chance of rain.
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>
>
> After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:
> > DESCRIBE foo;
>    foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
> If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy.
> What wunn usually wants is:
>    foo: {f1:int, f2:chararray, f3: int}
> At this point, won is left with two choices, neither of which is very good.  Choice wan:
> > foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
> This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:
> > foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;
> or
> > foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
> This is a poor choice because it is verbose and cumbersome.
> With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:
> foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
> DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};
> I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

Posted by "Russell Jurney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Russell Jurney updated PIG-1476:
--------------------------------

    Description: 
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

> DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy.

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  Choice wan:

> foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:

> foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;

or

> foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.


  was:
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

> DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  Choice wan:

> foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:

> foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;

or

> foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.



> Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
> -------------------------------------------------------------------------------------------
>
>                 Key: PIG-1476
>                 URL: https://issues.apache.org/jira/browse/PIG-1476
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: sunny, 60% humidity with a chance of rain.
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>
>
> After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:
> > DESCRIBE foo;
>    foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
> If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy.
> What wunn usually wants is:
>    foo: {f1:int, f2:chararray, f3: int}
> At this point, won is left with two choices, neither of which is very good.  Choice wan:
> > foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
> This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:
> > foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;
> or
> > foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
> This is a poor choice because it is verbose and cumbersome.
> With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:
> foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
> DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};
> I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

Posted by "Russell Jurney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Russell Jurney updated PIG-1476:
--------------------------------

    Description: 
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

> DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  Choice wan:

> foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:

> foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;

or

> foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.


  was:
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

> DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  Choice wan:

> foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:

> foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.



> Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
> -------------------------------------------------------------------------------------------
>
>                 Key: PIG-1476
>                 URL: https://issues.apache.org/jira/browse/PIG-1476
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: sunny, 60% humidity with a chance of rain.
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>
>
> After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:
> > DESCRIBE foo;
>    foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
> What wunn usually wants is:
>    foo: {f1:int, f2:chararray, f3: int}
> At this point, won is left with two choices, neither of which is very good.  Choice wan:
> > foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
> This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:
> > foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;
> or
> > foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
> This is a poor choice because it is verbose and cumbersome.
> With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:
> foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
> DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};
> I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

Posted by "Russell Jurney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Russell Jurney resolved PIG-1476.
---------------------------------

    Resolution: Fixed

This is actually ok.

> Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
> -------------------------------------------------------------------------------------------
>
>                 Key: PIG-1476
>                 URL: https://issues.apache.org/jira/browse/PIG-1476
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: sunny, 60% humidity with a chance of rain.
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>
>
> After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:
> > DESCRIBE foo;
>    foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
> If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy.
> What wunn usually wants is:
>    foo: {f1:int, f2:chararray, f3: int}
> At this point, won is left with two choices, neither of which is very good.  Choice wan:
> > foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
> This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script.  So instead whun does this:
> > foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3;
> or
> > foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
> This is a poor choice because it is verbose and cumbersome.
> With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn should do to avoid this situation:
> foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
> DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};
> I think so, anyway.  I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.