You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Viraj Bhat (Created) (JIRA)" <ji...@apache.org> on 2012/01/31 04:04:10 UTC

[jira] [Created] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Order of execution of fs, store and sh commands in Pig is not maintained
------------------------------------------------------------------------

                 Key: PIG-2497
                 URL: https://issues.apache.org/jira/browse/PIG-2497
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.9.1
            Reporter: Viraj Bhat


I have a pig script like this :
--Load data, process it and store to two outputs
{code}
a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
b = group a by (cookie);
c = foreach b generate group, COUNT_STAR(a);
store c into '$COUNT_OUTPUT' using PigStorage();
store b into '$GRID_OUTPUT' using PigStorage();
--Remove local file, copy to local and remove processed file from grid
sh rm -rf '$LOCAL_OUTPUT';
fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
fs -rmr '$GRID_OUTPUT';

Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.

Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.


Thanks
Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2497:
----------------------------

    Fix Version/s: 0.11
                   0.9.3
                   0.10
         Assignee: Daniel Dai
    
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.9.3, 0.11
>
>         Attachments: PIG-2497-1.patch
>
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Daniel Dai (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196681#comment-13196681 ] 

Daniel Dai commented on PIG-2497:
---------------------------------

Seems we didn't enforce an exec for sh, but we did for fs.
                
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Viraj Bhat (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-2497:
----------------------------

    Description: 
I have a pig script like this :
--Load data, process it and store to two outputs
{code}
a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
b = group a by (cookie);
c = foreach b generate group, COUNT_STAR(a);
store c into '$COUNT_OUTPUT' using PigStorage();
store b into '$GRID_OUTPUT' using PigStorage();
--Remove local file, copy to local and remove processed file from grid
sh rm -rf '$LOCAL_OUTPUT';
fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
fs -rmr '$GRID_OUTPUT';
{code}

Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.

Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.


Thanks
Viraj

  was:
I have a pig script like this :
--Load data, process it and store to two outputs
{code}
a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
b = group a by (cookie);
c = foreach b generate group, COUNT_STAR(a);
store c into '$COUNT_OUTPUT' using PigStorage();
store b into '$GRID_OUTPUT' using PigStorage();
--Remove local file, copy to local and remove processed file from grid
sh rm -rf '$LOCAL_OUTPUT';
fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
fs -rmr '$GRID_OUTPUT';

Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.

Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.


Thanks
Viraj

    
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Viraj Bhat (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197175#comment-13197175 ] 

Viraj Bhat commented on PIG-2497:
---------------------------------

Daniel can we target this patch for Pig 0.9.3 and Pig 0.10.1
                
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>         Attachments: PIG-2497-1.patch
>
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Daniel Dai (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-2497.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed

Unit test pass. test-patch:     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     -1 release audit.  The applied patch generated 527 release audit warnings (more than the trunk's current 524 warnings).

javadoc warning is unrelated. No new file added, ignore release audit warning. 

Patch committed to 0.9/0.10/trunk.
                
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.9.3, 0.11
>
>         Attachments: PIG-2497-1.patch
>
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Thejas M Nair (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202052#comment-13202052 ] 

Thejas M Nair commented on PIG-2497:
------------------------------------

+1 . Created PIG-2516 to track other issues remaining in sh implementation that Daniel found.
                
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.9.3, 0.11
>
>         Attachments: PIG-2497-1.patch
>
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2497) Order of execution of fs, store and sh commands in Pig is not maintained

Posted by "Daniel Dai (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2497:
----------------------------

    Attachment: PIG-2497-1.patch
    
> Order of execution of fs, store and sh commands in Pig is not maintained
> ------------------------------------------------------------------------
>
>                 Key: PIG-2497
>                 URL: https://issues.apache.org/jira/browse/PIG-2497
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Viraj Bhat
>         Attachments: PIG-2497-1.patch
>
>
> I have a pig script like this :
> --Load data, process it and store to two outputs
> {code}
> a = load 'dummy.txt' as (cookie: chararray,timestamp: long,url: chararray);
> b = group a by (cookie);
> c = foreach b generate group, COUNT_STAR(a);
> store c into '$COUNT_OUTPUT' using PigStorage();
> store b into '$GRID_OUTPUT' using PigStorage();
> --Remove local file, copy to local and remove processed file from grid
> sh rm -rf '$LOCAL_OUTPUT';
> fs -getmerge '$GRID_OUTPUT' '$LOCAL_OUTPUT';
> fs -rmr '$GRID_OUTPUT';
> {code}
> Pig does not guarantee the order of command execution in the above script i.e. the "store" "sh rm...", "fs -getmerge ..." and "fs -rmr ..." will not be executed in the written order.
> Pig guarantees that "fs" commands and pig "store" commands will be executed in sequence. But "sh" commands will get executed before anything else (in normal multi-query mode) because "sh"  commands are executed when the parser sees them. They go through a different code path within Pig. This behavior needs to be changed.
> Thanks
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira