You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prashanth Pappu <pr...@conviva.com> on 2008/05/23 23:53:15 UTC

PIG doesn't fly

All:

I've seen a thread with a similar issue and it was left unresolved.
So, here it goes again -

(a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
script.
(a.1) The hadoop-site.xml file is in /home/hadoop and the script is
/home/hadoop/tmp/pig

(b) PIG finds the data file in DFS but does not run any mapreduce jobs on
the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
local job runner.

(c) What am I missing? How do I get PIG to schedule its mapred jobs on the
cluster?

Thanks,
Prashanth

>> verbose debug

java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
2008-05-23 16:43:44,636 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: local
2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration -
java.io.IOException: config()
        at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
        at
org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
        at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
        at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
        at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
        at org.apache.pig.PigServer.<init>(PigServer.java:149)
        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
        at org.apache.pig.Main.main(Main.java:295)
...

>> simple debug

java -cp /home/hadoop:pig.jar org.apache.pig.Main  /home/hadoop/tmp.pig
2008-05-23 16:51:13,076 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: local
2008-05-23 16:51:13,386 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2008-05-23 16:51:14,038 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce
Job -----
2008-05-23 16:51:14,038 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
[/user/hadoop/prashanth/log1:PigStorage(','),
/user/hadoop/prashanth/log1:PigStorage(','),
/user/hadoop/prashanth/log1:PigStorage(',')]
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
[[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
$38],[PROJECT $37],[PROJECT $32]}->GENERATE
{[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
{[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
$0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
$2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT
$3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
{[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
{[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
{[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
$0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
$3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
$1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
{[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
{[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
{[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
$0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
$3],[PROJECT $2],[PROJECT $4]}]
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
/tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
2008-05-23 16:51:14,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
2008-05-23 16:51:14,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism:
-1
2008-05-23 16:51:14,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
parallelism: -1
2008-05-23 16:51:15,640 [Thread-14] INFO  org.apache.hadoop.mapred.MapTask -
numReduceTasks: 1
2008-05-23 16:51:16,531 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
- Pig progress = 0%
2008-05-23 16:51:17,344 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_1

Re: PIG doesn't fly

Posted by Prashanth Pappu <pr...@conviva.com>.
Thanks Alan. I fixed the cluster property in pig.properties and it worked. I
was simply using instructions from the get-started wiki and seem to have
missed editing the pig.properties file.

Thanks!
Prashanth

On Fri, May 23, 2008 at 3:21 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> A couple of questions.  Is the hadoop-site.xml in your class path when you
> run pig?  In you pig.properties file, what do you have exectype set to?  It
> should be set to mapreduce.  What do you have cluster set to?  It should be
> the hostname:port for the job tracker of your cluster.
>
> Alan.
>
>
> Prashanth Pappu wrote:
>
>> All:
>>
>> I've seen a thread with a similar issue and it was left unresolved.
>> So, here it goes again -
>>
>> (a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
>> script.
>> (a.1) The hadoop-site.xml file is in /home/hadoop and the script is
>> /home/hadoop/tmp/pig
>>
>> (b) PIG finds the data file in DFS but does not run any mapreduce jobs on
>> the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
>> local job runner.
>>
>> (c) What am I missing? How do I get PIG to schedule its mapred jobs on the
>> cluster?
>>
>> Thanks,
>> Prashanth
>>
>>
>>
>>> verbose debug
>>>>
>>>>
>>>
>> java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
>> 2008-05-23 16:43:44,636 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting
>> to hadoop file system at: local
>> 2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration
>> -
>> java.io.IOException: config()
>>        at
>> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
>>        at
>>
>> org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
>>        at
>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
>>        at
>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
>>        at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
>>        at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
>>        at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
>>        at org.apache.pig.PigServer.<init>(PigServer.java:149)
>>        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
>>        at org.apache.pig.Main.main(Main.java:295)
>> ...
>>
>>
>>
>>> simple debug
>>>>
>>>>
>>>
>> java -cp /home/hadoop:pig.jar org.apache.pig.Main  /home/hadoop/tmp.pig
>> 2008-05-23 16:51:13,076 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting
>> to hadoop file system at: local
>> 2008-05-23 16:51:13,386 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 2008-05-23 16:51:14,038 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - -----
>> MapReduce
>> Job -----
>> 2008-05-23 16:51:14,038 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
>> [/user/hadoop/prashanth/log1:PigStorage(','),
>> /user/hadoop/prashanth/log1:PigStorage(','),
>> /user/hadoop/prashanth/log1:PigStorage(',')]
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
>> [[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
>> $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
>> $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
>> $1]})],[PROJECT
>> $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
>> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
>> $3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
>> $1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
>> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
>> $3],[PROJECT $2],[PROJECT $4]}]
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
>> /tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map
>> parallelism:
>> -1
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
>> parallelism: -1
>> 2008-05-23 16:51:15,640 [Thread-14] INFO  org.apache.hadoop.mapred.MapTask
>> -
>> numReduceTasks: 1
>> 2008-05-23 16:51:16,531 [main] INFO
>>
>> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
>> - Pig progress = 0%
>> 2008-05-23 16:51:17,344 [Thread-14] WARN
>> org.apache.hadoop.mapred.LocalJobRunner - job_local_1
>>
>>
>>
>

Re: PIG doesn't fly

Posted by Alan Gates <ga...@yahoo-inc.com>.
A couple of questions.  Is the hadoop-site.xml in your class path when 
you run pig?  In you pig.properties file, what do you have exectype set 
to?  It should be set to mapreduce.  What do you have cluster set to?  
It should be the hostname:port for the job tracker of your cluster.

Alan.

Prashanth Pappu wrote:
> All:
>
> I've seen a thread with a similar issue and it was left unresolved.
> So, here it goes again -
>
> (a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
> script.
> (a.1) The hadoop-site.xml file is in /home/hadoop and the script is
> /home/hadoop/tmp/pig
>
> (b) PIG finds the data file in DFS but does not run any mapreduce jobs on
> the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
> local job runner.
>
> (c) What am I missing? How do I get PIG to schedule its mapred jobs on the
> cluster?
>
> Thanks,
> Prashanth
>
>   
>>> verbose debug
>>>       
>
> java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
> 2008-05-23 16:43:44,636 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: local
> 2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration -
> java.io.IOException: config()
>         at
> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
>         at
> org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
>         at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
>         at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
>         at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
>         at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
>         at org.apache.pig.PigServer.<init>(PigServer.java:149)
>         at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
>         at org.apache.pig.Main.main(Main.java:295)
> ...
>
>   
>>> simple debug
>>>       
>
> java -cp /home/hadoop:pig.jar org.apache.pig.Main  /home/hadoop/tmp.pig
> 2008-05-23 16:51:13,076 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: local
> 2008-05-23 16:51:13,386 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2008-05-23 16:51:14,038 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce
> Job -----
> 2008-05-23 16:51:14,038 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
> [/user/hadoop/prashanth/log1:PigStorage(','),
> /user/hadoop/prashanth/log1:PigStorage(','),
> /user/hadoop/prashanth/log1:PigStorage(',')]
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
> [[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
> $38],[PROJECT $37],[PROJECT $32]}->GENERATE
> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
> $0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
> $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT
> $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
> $0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
> $3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
> $1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
> $0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
> $3],[PROJECT $2],[PROJECT $4]}]
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
> /tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
> 2008-05-23 16:51:14,040 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-05-23 16:51:14,040 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism:
> -1
> 2008-05-23 16:51:14,040 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
> parallelism: -1
> 2008-05-23 16:51:15,640 [Thread-14] INFO  org.apache.hadoop.mapred.MapTask -
> numReduceTasks: 1
> 2008-05-23 16:51:16,531 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
> - Pig progress = 0%
> 2008-05-23 16:51:17,344 [Thread-14] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_1
>
>