You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2008/02/25 05:00:51 UTC

[jira] Created: (PIG-120) support hadoop map reduce in loal mode

support hadoop map reduce in loal mode
--------------------------------------

                 Key: PIG-120
                 URL: https://issues.apache.org/jira/browse/PIG-120
             Project: Pig
          Issue Type: Bug
            Reporter: Stefan Groschupf


Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
For example we was able to debug and reproduce PIG-110 with this method.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-120) support hadoop map reduce in loal mode

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Groschupf updated PIG-120:
---------------------------------

    Attachment: PIG-120_v_1.patch

A patch that allows to run pig in mapreduce mode but uses the hadoop localjobrunner. This is for sure not the most elegant solution but a starting point. As mentioned in PIG-121 I guess HExecutionEngine and Co need a cleanup anyhow.
This patch would be very useful for phase 1 of pig-119. It would be great if we can get this into trunk as a starting, since I guess PIG-121 will require some more discussion and work.



> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-120) support hadoop map reduce in loal mode

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573426#action_12573426 ] 

Stefan Groschupf commented on PIG-120:
--------------------------------------

I'm very sorry for the confusion.
In general *all the same tests run in every case* we just switch execution engines and execution engine configurations. 
ant -Dtest.mode=excelLocal -> runs the pig local execution engine
ant -Dtest.mode=mapredLocal -> runs the hadoop execution engine but using the hadoops LocalJobRunner -- this should be default, since the test suite would run in the less than 50 % of the time
ant -Dtest.mode=mapredCluster -> runs the hadoop execution egine with the minicluster.

My testcase only test if it is possible to set "local" as nameNode and jobtracker - nothing else.

I guess we can find better names for the test modes. 

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-120) support hadoop map reduce in loal mode

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573394#action_12573394 ] 

Stefan Groschupf commented on PIG-120:
--------------------------------------

Alan, sorry I'm not sure if I can follow you. In general I see 3 kind of how we can run pig, LocalExecutionEngine, HadoopExecutionEngine - map reduce using a hadoop cluster and HadoopExecutionEngine - map reduce using hadoops localJobRunner.
This is very very helpful for debugging and profiling since the HadoopExecutionEngine is used but all runs in the same jvm. 
This patch makes it possible by not using a port in case the name node and job tracker are "local" and also not opening a remote proxy to the jobtracker in that case.

Makes that sense? 

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-120) support hadoop map reduce in loal mode

Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574168#action_12574168 ] 

Stefan Groschupf commented on PIG-120:
--------------------------------------

yeah just set the cluster name to local. 
This patch is a beginning - i would love to more explizit support that in the future. 
A related proplem for example is that we can not define a jobtracker and namenode on different host by today. 

The configuration patch will solve some basic problems here - please vote it for better hadoop local mode support in the future. :)



> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-120) support hadoop map reduce in loal mode

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573340#action_12573340 ] 

Alan Gates commented on PIG-120:
--------------------------------

Based on your comments in https://issues.apache.org/jira/browse/PIG-119, my understanding is that you want to be able to run the existing pig tests in hadoop's local mode.  But this patch provides a different test that runs in local mode.  Is this not a step in the wrong direction?

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-120) support hadoop map reduce in loal mode

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates resolved PIG-120.
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.1.0

Fix checked in as revision 633652.  Thanks Stefan.

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>             Fix For: 0.1.0
>
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-120) support hadoop map reduce in loal mode

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573415#action_12573415 ] 

Alan Gates commented on PIG-120:
--------------------------------

If someone says:

ant -Dtest.mode=local

what happens?  Does it run *all the same tests* as usual, only using local mode hadoop?  Or does it run only tests that are specific to local mode hadoop?  I was envisioning the former, but your inclusion in patch of a test (TestLocalMapReduce) that was specific to local mode made me think you were suggesting the latter.

Your changes to HExecutionEngine would support either I think.

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-120) support hadoop map reduce in loal mode

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573654#action_12573654 ] 

Pi Song commented on PIG-120:
-----------------------------

+1  with the concept. Allowing Hadoop local execution mode will be very beneficial for testing with a subset of data before going into production. Theoretically outputs from Pig local and Hadoop Mapreduce should be exactly the same but sometimes I found that they are different. In such case, I would trust local hadoop more than Pig local for my development.

So, from the patch, if I want to run local hadoop Pig, I just have to set  cluster and nameNode properties to "local" right?

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-120) support hadoop map reduce in loal mode

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-120:
-------------------------------

    Patch Info: [Patch Available]

> support hadoop map reduce in loal mode
> --------------------------------------
>
>                 Key: PIG-120
>                 URL: https://issues.apache.org/jira/browse/PIG-120
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Stefan Groschupf
>         Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. LocalExecutionEngine is used for local and HExecutionEngine for map reduce. HExecutionEngine always expect that hadoop runs as cluster with a name node and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give several advantages. 
> First it would speed up the test suite significant. Second it would be possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.