You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prashanth Pappu <pr...@conviva.com> on 2008/05/23 23:53:15 UTC
PIG doesn't fly
All:
I've seen a thread with a similar issue and it was left unresolved.
So, here it goes again -
(a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
script.
(a.1) The hadoop-site.xml file is in /home/hadoop and the script is
/home/hadoop/tmp/pig
(b) PIG finds the data file in DFS but does not run any mapreduce jobs on
the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
local job runner.
(c) What am I missing? How do I get PIG to schedule its mapred jobs on the
cluster?
Thanks,
Prashanth
>> verbose debug
java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
2008-05-23 16:43:44,636 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: local
2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration -
java.io.IOException: config()
at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
at
org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
at org.apache.pig.PigServer.<init>(PigServer.java:149)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
at org.apache.pig.Main.main(Main.java:295)
...
>> simple debug
java -cp /home/hadoop:pig.jar org.apache.pig.Main /home/hadoop/tmp.pig
2008-05-23 16:51:13,076 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: local
2008-05-23 16:51:13,386 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2008-05-23 16:51:14,038 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce
Job -----
2008-05-23 16:51:14,038 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
[/user/hadoop/prashanth/log1:PigStorage(','),
/user/hadoop/prashanth/log1:PigStorage(','),
/user/hadoop/prashanth/log1:PigStorage(',')]
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
[[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
$38],[PROJECT $37],[PROJECT $32]}->GENERATE
{[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
{[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
$0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
$2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT
$3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
{[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
{[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
{[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
$0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
$3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
$1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
{[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
{[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
{[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
$0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
$3],[PROJECT $2],[PROJECT $4]}]
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
2008-05-23 16:51:14,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
/tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
2008-05-23 16:51:14,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
2008-05-23 16:51:14,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism:
-1
2008-05-23 16:51:14,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
parallelism: -1
2008-05-23 16:51:15,640 [Thread-14] INFO org.apache.hadoop.mapred.MapTask -
numReduceTasks: 1
2008-05-23 16:51:16,531 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
- Pig progress = 0%
2008-05-23 16:51:17,344 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_1
Re: PIG doesn't fly
Posted by Prashanth Pappu <pr...@conviva.com>.
Thanks Alan. I fixed the cluster property in pig.properties and it worked. I
was simply using instructions from the get-started wiki and seem to have
missed editing the pig.properties file.
Thanks!
Prashanth
On Fri, May 23, 2008 at 3:21 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> A couple of questions. Is the hadoop-site.xml in your class path when you
> run pig? In you pig.properties file, what do you have exectype set to? It
> should be set to mapreduce. What do you have cluster set to? It should be
> the hostname:port for the job tracker of your cluster.
>
> Alan.
>
>
> Prashanth Pappu wrote:
>
>> All:
>>
>> I've seen a thread with a similar issue and it was left unresolved.
>> So, here it goes again -
>>
>> (a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
>> script.
>> (a.1) The hadoop-site.xml file is in /home/hadoop and the script is
>> /home/hadoop/tmp/pig
>>
>> (b) PIG finds the data file in DFS but does not run any mapreduce jobs on
>> the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
>> local job runner.
>>
>> (c) What am I missing? How do I get PIG to schedule its mapred jobs on the
>> cluster?
>>
>> Thanks,
>> Prashanth
>>
>>
>>
>>> verbose debug
>>>>
>>>>
>>>
>> java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
>> 2008-05-23 16:43:44,636 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting
>> to hadoop file system at: local
>> 2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration
>> -
>> java.io.IOException: config()
>> at
>> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
>> at
>>
>> org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
>> at
>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
>> at
>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
>> at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
>> at org.apache.pig.PigServer.<init>(PigServer.java:149)
>> at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
>> at org.apache.pig.Main.main(Main.java:295)
>> ...
>>
>>
>>
>>> simple debug
>>>>
>>>>
>>>
>> java -cp /home/hadoop:pig.jar org.apache.pig.Main /home/hadoop/tmp.pig
>> 2008-05-23 16:51:13,076 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting
>> to hadoop file system at: local
>> 2008-05-23 16:51:13,386 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 2008-05-23 16:51:14,038 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - -----
>> MapReduce
>> Job -----
>> 2008-05-23 16:51:14,038 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
>> [/user/hadoop/prashanth/log1:PigStorage(','),
>> /user/hadoop/prashanth/log1:PigStorage(','),
>> /user/hadoop/prashanth/log1:PigStorage(',')]
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
>> [[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
>> $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
>> $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
>> $1]})],[PROJECT
>> $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
>> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
>> $3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
>> $1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
>> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
>> $3],[PROJECT $2],[PROJECT $4]}]
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
>> /tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map
>> parallelism:
>> -1
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
>> parallelism: -1
>> 2008-05-23 16:51:15,640 [Thread-14] INFO org.apache.hadoop.mapred.MapTask
>> -
>> numReduceTasks: 1
>> 2008-05-23 16:51:16,531 [main] INFO
>>
>> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
>> - Pig progress = 0%
>> 2008-05-23 16:51:17,344 [Thread-14] WARN
>> org.apache.hadoop.mapred.LocalJobRunner - job_local_1
>>
>>
>>
>
Re: PIG doesn't fly
Posted by Alan Gates <ga...@yahoo-inc.com>.
A couple of questions. Is the hadoop-site.xml in your class path when
you run pig? In you pig.properties file, what do you have exectype set
to? It should be set to mapreduce. What do you have cluster set to?
It should be the hostname:port for the job tracker of your cluster.
Alan.
Prashanth Pappu wrote:
> All:
>
> I've seen a thread with a similar issue and it was left unresolved.
> So, here it goes again -
>
> (a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
> script.
> (a.1) The hadoop-site.xml file is in /home/hadoop and the script is
> /home/hadoop/tmp/pig
>
> (b) PIG finds the data file in DFS but does not run any mapreduce jobs on
> the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
> local job runner.
>
> (c) What am I missing? How do I get PIG to schedule its mapred jobs on the
> cluster?
>
> Thanks,
> Prashanth
>
>
>>> verbose debug
>>>
>
> java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
> 2008-05-23 16:43:44,636 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: local
> 2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration -
> java.io.IOException: config()
> at
> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
> at
> org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
> at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
> at org.apache.pig.PigServer.<init>(PigServer.java:149)
> at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
> at org.apache.pig.Main.main(Main.java:295)
> ...
>
>
>>> simple debug
>>>
>
> java -cp /home/hadoop:pig.jar org.apache.pig.Main /home/hadoop/tmp.pig
> 2008-05-23 16:51:13,076 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: local
> 2008-05-23 16:51:13,386 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2008-05-23 16:51:14,038 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce
> Job -----
> 2008-05-23 16:51:14,038 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
> [/user/hadoop/prashanth/log1:PigStorage(','),
> /user/hadoop/prashanth/log1:PigStorage(','),
> /user/hadoop/prashanth/log1:PigStorage(',')]
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
> [[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
> $38],[PROJECT $37],[PROJECT $32]}->GENERATE
> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
> $0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
> $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT
> $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
> $0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
> $3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
> $1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
> $0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
> $3],[PROJECT $2],[PROJECT $4]}]
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-05-23 16:51:14,039 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
> /tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
> 2008-05-23 16:51:14,040 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-05-23 16:51:14,040 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism:
> -1
> 2008-05-23 16:51:14,040 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
> parallelism: -1
> 2008-05-23 16:51:15,640 [Thread-14] INFO org.apache.hadoop.mapred.MapTask -
> numReduceTasks: 1
> 2008-05-23 16:51:16,531 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
> - Pig progress = 0%
> 2008-05-23 16:51:17,344 [Thread-14] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_1
>
>