You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2009/02/02 21:49:29 UTC

Trouble running pig on a cluster

Greetings everyone,
I installed, built, and am able to run the types-stable-2 tag on my
local machine.
I am also able to run Hadoop jobs on our Hadoop cluster.

However, when I try to run Pig on the cluster in map-reduce mode, it
outputs the following message:

dvryaboy@stout:~/PigOptimizer$ ./bin/pig  -x mapreduce -verbose
2009-02-02 15:34:25,887 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/

Then it hangs (no prompt), and eventually exits.

I've set the PIGDIR, HADOOPDIR, and PIG_CLASSPATH variables; my
conf/pig.properties file is completely commented out.

The Hadoop namenode is running on port 9000, as indicated in the above message.

I imagine this is some configuration issue, possibly related to the
fact that our NameNode and JobTracker are running on non-standard
ports.

I am not sure what logs I need to be looking at to figure out what's
going on (and where those logs are) -- can anyone suggest the next
steps?

Thanks a lot,
-Dmitriy

Re: Trouble running pig on a cluster

Posted by Alan Gates <ga...@yahoo-inc.com>.

Better than a note on the wiki would be to file a JIRA.  Since the  
default version is 0.18 the code should support that as the default  
throughout.  Good catch.

Alan.

On Feb 9, 2009, at 1:48 PM, Dmitriy Ryaboy wrote:

> Alan,
> I tried your suggestion and it still didn't work -- BUT! looking at it
> anew, I figured out the solution -- the bin/pig script still sets the
> default Hadoop version to 17, while I am using 18. This doesn't matter
> in local mode, naturally, but bites you when you go into hadoop mode.
>
> Setting PIG_HADOOP_VERSION=-18 solves the issue.
> Perhaps a note about this can be added to the wiki?
>
> ## before
> dvryaboy@stout:~/PigOptimizer$ echo $PIG_HADOOP_VERSION
>
> dvryaboy@stout:~/PigOptimizer$ ./bin/pig
> 2009-02-09 16:34:11,632 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
> ## hangs, ctrl-c to exit...
>
> ## after
> dvryaboy@stout:~/PigOptimizer$ export PIG_HADOOP_VERSION=-18
> dvryaboy@stout:~/PigOptimizer$ ./bin/pig
> 2009-02-09 16:35:02,337 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
> 2009-02-09 16:35:02,636 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: hdfs://dbhadoopmg:9001/
> grunt> quit
>
> -Dmitriy
>
> On Mon, Feb 9, 2009 at 12:16 PM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>> Everything looks like it's in order.  I looked through the code and  
>> it only
>> hard codes 8020 if a port wasn't specified for HDFS, but clearly you
>> specified a port.  I can think of a couple things to try:
>>
>> 1) Use a full host name for the HDFS host.  When I glanced at my
>> configuration files I noticed it was using x.y.z.yahoo.com:8020  
>> rather than
>> just x:8020.  Assumably as long as dbhadoopmg resolves from the  
>> host you're
>> running on it should work, but it's worth a try.
>>
>> 2) Run from java and see if it works (this is based on the  
>> assumption that
>> maybe PIG_CLASSPATH isn't being set correctly by the pig script).   
>> So do:
>>
>> java -Xmx512M -cp
>> hadoop/hadoop-current/conf:/home/dvryaboy/PigOptimizer/pig.jar
>> org.apache.pig.Main
>>
>> Alan.
>>
>> On Feb 3, 2009, at 11:58 AM, Dmitriy Ryaboy wrote:
>>
>>> Hi Alan,
>>> Thank you for taking look at this!
>>>
>>> Here is the PIG_CLASSPATH and the hadoop-site.xml
>>>
>>> dvryaboy@stout:~/PigOptimizer$ echo $PIG_CLASSPATH
>>>
>>> /home/dvryaboy/PigOptimizer//pig.jar:/hadoop/hadoop-current/conf/
>>> dvryaboy@stout:~/PigOptimizer$ cat
>>> /hadoop/hadoop-current/conf/hadoop-site.xml                    <?xml
>>> version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>hdfs://dbhadoopmg:9000/</value>
>>> </property>
>>>
>>> <property>
>>>  <name>mapred.job.tracker</name>
>>>  <value>hdfs://dbhadoopmg:9001/</value>
>>> </property>
>>>
>>> <property>
>>>  <name>dfs.replication</name>
>>>  <value>1</value>
>>> </property>
>>>
>>> </configuration>
>>>
>>>
>>> On Tue, Feb 3, 2009 at 1:28 PM, Alan Gates <ga...@yahoo-inc.com>  
>>> wrote:
>>>>
>>>> Normally you'd expect the next message you see to be that it is
>>>> connecting
>>>> to the JobTracker.  Pig doesn't really do anything between  
>>>> connecting to
>>>> the
>>>> HDFS and the job tracker, so I would assume it is failing to  
>>>> connect to
>>>> HDFS.
>>>>
>>>> Where is the hadoop-site.xml file that contains the info for your  
>>>> job
>>>> tracker and namenode?  Is the directory containing that file in  
>>>> your
>>>> PIG_CLASSPATH?
>>>>
>>>> Alan.
>>>>
>>>> On Feb 2, 2009, at 12:49 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>> Greetings everyone,
>>>>> I installed, built, and am able to run the types-stable-2 tag on  
>>>>> my
>>>>> local machine.
>>>>> I am also able to run Hadoop jobs on our Hadoop cluster.
>>>>>
>>>>> However, when I try to run Pig on the cluster in map-reduce  
>>>>> mode, it
>>>>> outputs the following message:
>>>>>
>>>>> dvryaboy@stout:~/PigOptimizer$ ./bin/pig  -x mapreduce -verbose
>>>>> 2009-02-02 15:34:25,887 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>>> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
>>>>>
>>>>> Then it hangs (no prompt), and eventually exits.
>>>>>
>>>>> I've set the PIGDIR, HADOOPDIR, and PIG_CLASSPATH variables; my
>>>>> conf/pig.properties file is completely commented out.
>>>>>
>>>>> The Hadoop namenode is running on port 9000, as indicated in the  
>>>>> above
>>>>> message.
>>>>>
>>>>> I imagine this is some configuration issue, possibly related to  
>>>>> the
>>>>> fact that our NameNode and JobTracker are running on non-standard
>>>>> ports.
>>>>>
>>>>> I am not sure what logs I need to be looking at to figure out  
>>>>> what's
>>>>> going on (and where those logs are) -- can anyone suggest the next
>>>>> steps?
>>>>>
>>>>> Thanks a lot,
>>>>> -Dmitriy
>>>>
>>>>
>>
>>

Re: Trouble running pig on a cluster

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Alan,
I tried your suggestion and it still didn't work -- BUT! looking at it
anew, I figured out the solution -- the bin/pig script still sets the
default Hadoop version to 17, while I am using 18. This doesn't matter
in local mode, naturally, but bites you when you go into hadoop mode.

Setting PIG_HADOOP_VERSION=-18 solves the issue.
Perhaps a note about this can be added to the wiki?

## before
dvryaboy@stout:~/PigOptimizer$ echo $PIG_HADOOP_VERSION

dvryaboy@stout:~/PigOptimizer$ ./bin/pig
2009-02-09 16:34:11,632 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
## hangs, ctrl-c to exit...

## after
dvryaboy@stout:~/PigOptimizer$ export PIG_HADOOP_VERSION=-18
dvryaboy@stout:~/PigOptimizer$ ./bin/pig
2009-02-09 16:35:02,337 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
2009-02-09 16:35:02,636 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: hdfs://dbhadoopmg:9001/
grunt> quit

-Dmitriy

On Mon, Feb 9, 2009 at 12:16 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> Everything looks like it's in order.  I looked through the code and it only
> hard codes 8020 if a port wasn't specified for HDFS, but clearly you
> specified a port.  I can think of a couple things to try:
>
> 1) Use a full host name for the HDFS host.  When I glanced at my
> configuration files I noticed it was using x.y.z.yahoo.com:8020 rather than
> just x:8020.  Assumably as long as dbhadoopmg resolves from the host you're
> running on it should work, but it's worth a try.
>
> 2) Run from java and see if it works (this is based on the assumption that
> maybe PIG_CLASSPATH isn't being set correctly by the pig script).  So do:
>
> java -Xmx512M -cp
> hadoop/hadoop-current/conf:/home/dvryaboy/PigOptimizer/pig.jar
> org.apache.pig.Main
>
> Alan.
>
> On Feb 3, 2009, at 11:58 AM, Dmitriy Ryaboy wrote:
>
>> Hi Alan,
>> Thank you for taking look at this!
>>
>> Here is the PIG_CLASSPATH and the hadoop-site.xml
>>
>> dvryaboy@stout:~/PigOptimizer$ echo $PIG_CLASSPATH
>>
>> /home/dvryaboy/PigOptimizer//pig.jar:/hadoop/hadoop-current/conf/
>> dvryaboy@stout:~/PigOptimizer$ cat
>> /hadoop/hadoop-current/conf/hadoop-site.xml                    <?xml
>> version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>>
>>  <property>
>>   <name>fs.default.name</name>
>>   <value>hdfs://dbhadoopmg:9000/</value>
>>  </property>
>>
>>  <property>
>>   <name>mapred.job.tracker</name>
>>   <value>hdfs://dbhadoopmg:9001/</value>
>>  </property>
>>
>>  <property>
>>   <name>dfs.replication</name>
>>   <value>1</value>
>>  </property>
>>
>> </configuration>
>>
>>
>> On Tue, Feb 3, 2009 at 1:28 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>>>
>>> Normally you'd expect the next message you see to be that it is
>>> connecting
>>> to the JobTracker.  Pig doesn't really do anything between connecting to
>>> the
>>> HDFS and the job tracker, so I would assume it is failing to connect to
>>> HDFS.
>>>
>>> Where is the hadoop-site.xml file that contains the info for your job
>>> tracker and namenode?  Is the directory containing that file in your
>>> PIG_CLASSPATH?
>>>
>>> Alan.
>>>
>>> On Feb 2, 2009, at 12:49 PM, Dmitriy Ryaboy wrote:
>>>
>>>> Greetings everyone,
>>>> I installed, built, and am able to run the types-stable-2 tag on my
>>>> local machine.
>>>> I am also able to run Hadoop jobs on our Hadoop cluster.
>>>>
>>>> However, when I try to run Pig on the cluster in map-reduce mode, it
>>>> outputs the following message:
>>>>
>>>> dvryaboy@stout:~/PigOptimizer$ ./bin/pig  -x mapreduce -verbose
>>>> 2009-02-02 15:34:25,887 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
>>>>
>>>> Then it hangs (no prompt), and eventually exits.
>>>>
>>>> I've set the PIGDIR, HADOOPDIR, and PIG_CLASSPATH variables; my
>>>> conf/pig.properties file is completely commented out.
>>>>
>>>> The Hadoop namenode is running on port 9000, as indicated in the above
>>>> message.
>>>>
>>>> I imagine this is some configuration issue, possibly related to the
>>>> fact that our NameNode and JobTracker are running on non-standard
>>>> ports.
>>>>
>>>> I am not sure what logs I need to be looking at to figure out what's
>>>> going on (and where those logs are) -- can anyone suggest the next
>>>> steps?
>>>>
>>>> Thanks a lot,
>>>> -Dmitriy
>>>
>>>
>
>

Re: Trouble running pig on a cluster

Posted by Alan Gates <ga...@yahoo-inc.com>.

Everything looks like it's in order.  I looked through the code and it  
only hard codes 8020 if a port wasn't specified for HDFS, but clearly  
you specified a port.  I can think of a couple things to try:

1) Use a full host name for the HDFS host.  When I glanced at my  
configuration files I noticed it was using x.y.z.yahoo.com:8020 rather  
than just x:8020.  Assumably as long as dbhadoopmg resolves from the  
host you're running on it should work, but it's worth a try.

2) Run from java and see if it works (this is based on the assumption  
that maybe PIG_CLASSPATH isn't being set correctly by the pig  
script).  So do:

java -Xmx512M -cp hadoop/hadoop-current/conf:/home/dvryaboy/ 
PigOptimizer/pig.jar org.apache.pig.Main

Alan.

On Feb 3, 2009, at 11:58 AM, Dmitriy Ryaboy wrote:

> Hi Alan,
> Thank you for taking look at this!
>
> Here is the PIG_CLASSPATH and the hadoop-site.xml
>
> dvryaboy@stout:~/PigOptimizer$ echo $PIG_CLASSPATH
>
> /home/dvryaboy/PigOptimizer//pig.jar:/hadoop/hadoop-current/conf/
> dvryaboy@stout:~/PigOptimizer$ cat
> /hadoop/hadoop-current/conf/hadoop-site.xml                    <?xml
> version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
>  <property>
>    <name>fs.default.name</name>
>    <value>hdfs://dbhadoopmg:9000/</value>
>  </property>
>
>  <property>
>    <name>mapred.job.tracker</name>
>    <value>hdfs://dbhadoopmg:9001/</value>
>  </property>
>
>  <property>
>    <name>dfs.replication</name>
>    <value>1</value>
>  </property>
>
> </configuration>
>
>
> On Tue, Feb 3, 2009 at 1:28 PM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>> Normally you'd expect the next message you see to be that it is  
>> connecting
>> to the JobTracker.  Pig doesn't really do anything between  
>> connecting to the
>> HDFS and the job tracker, so I would assume it is failing to  
>> connect to
>> HDFS.
>>
>> Where is the hadoop-site.xml file that contains the info for your job
>> tracker and namenode?  Is the directory containing that file in your
>> PIG_CLASSPATH?
>>
>> Alan.
>>
>> On Feb 2, 2009, at 12:49 PM, Dmitriy Ryaboy wrote:
>>
>>> Greetings everyone,
>>> I installed, built, and am able to run the types-stable-2 tag on my
>>> local machine.
>>> I am also able to run Hadoop jobs on our Hadoop cluster.
>>>
>>> However, when I try to run Pig on the cluster in map-reduce mode, it
>>> outputs the following message:
>>>
>>> dvryaboy@stout:~/PigOptimizer$ ./bin/pig  -x mapreduce -verbose
>>> 2009-02-02 15:34:25,887 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
>>>
>>> Then it hangs (no prompt), and eventually exits.
>>>
>>> I've set the PIGDIR, HADOOPDIR, and PIG_CLASSPATH variables; my
>>> conf/pig.properties file is completely commented out.
>>>
>>> The Hadoop namenode is running on port 9000, as indicated in the  
>>> above
>>> message.
>>>
>>> I imagine this is some configuration issue, possibly related to the
>>> fact that our NameNode and JobTracker are running on non-standard
>>> ports.
>>>
>>> I am not sure what logs I need to be looking at to figure out what's
>>> going on (and where those logs are) -- can anyone suggest the next
>>> steps?
>>>
>>> Thanks a lot,
>>> -Dmitriy
>>
>>

Re: Trouble running pig on a cluster

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Hi Alan,
Thank you for taking look at this!

Here is the PIG_CLASSPATH and the hadoop-site.xml

dvryaboy@stout:~/PigOptimizer$ echo $PIG_CLASSPATH

/home/dvryaboy/PigOptimizer//pig.jar:/hadoop/hadoop-current/conf/
dvryaboy@stout:~/PigOptimizer$ cat
/hadoop/hadoop-current/conf/hadoop-site.xml                    <?xml
version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://dbhadoopmg:9000/</value>
  </property>

  <property>
    <name>mapred.job.tracker</name>
    <value>hdfs://dbhadoopmg:9001/</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

</configuration>


On Tue, Feb 3, 2009 at 1:28 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> Normally you'd expect the next message you see to be that it is connecting
> to the JobTracker.  Pig doesn't really do anything between connecting to the
> HDFS and the job tracker, so I would assume it is failing to connect to
> HDFS.
>
> Where is the hadoop-site.xml file that contains the info for your job
> tracker and namenode?  Is the directory containing that file in your
> PIG_CLASSPATH?
>
> Alan.
>
> On Feb 2, 2009, at 12:49 PM, Dmitriy Ryaboy wrote:
>
>> Greetings everyone,
>> I installed, built, and am able to run the types-stable-2 tag on my
>> local machine.
>> I am also able to run Hadoop jobs on our Hadoop cluster.
>>
>> However, when I try to run Pig on the cluster in map-reduce mode, it
>> outputs the following message:
>>
>> dvryaboy@stout:~/PigOptimizer$ ./bin/pig  -x mapreduce -verbose
>> 2009-02-02 15:34:25,887 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
>>
>> Then it hangs (no prompt), and eventually exits.
>>
>> I've set the PIGDIR, HADOOPDIR, and PIG_CLASSPATH variables; my
>> conf/pig.properties file is completely commented out.
>>
>> The Hadoop namenode is running on port 9000, as indicated in the above
>> message.
>>
>> I imagine this is some configuration issue, possibly related to the
>> fact that our NameNode and JobTracker are running on non-standard
>> ports.
>>
>> I am not sure what logs I need to be looking at to figure out what's
>> going on (and where those logs are) -- can anyone suggest the next
>> steps?
>>
>> Thanks a lot,
>> -Dmitriy
>
>

Re: Trouble running pig on a cluster

Posted by Alan Gates <ga...@yahoo-inc.com>.

Normally you'd expect the next message you see to be that it is  
connecting to the JobTracker.  Pig doesn't really do anything between  
connecting to the HDFS and the job tracker, so I would assume it is  
failing to connect to HDFS.

Where is the hadoop-site.xml file that contains the info for your job  
tracker and namenode?  Is the directory containing that file in your  
PIG_CLASSPATH?

Alan.

On Feb 2, 2009, at 12:49 PM, Dmitriy Ryaboy wrote:

> Greetings everyone,
> I installed, built, and am able to run the types-stable-2 tag on my
> local machine.
> I am also able to run Hadoop jobs on our Hadoop cluster.
>
> However, when I try to run Pig on the cluster in map-reduce mode, it
> outputs the following message:
>
> dvryaboy@stout:~/PigOptimizer$ ./bin/pig  -x mapreduce -verbose
> 2009-02-02 15:34:25,887 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://dbhadoopmg:9000/
>
> Then it hangs (no prompt), and eventually exits.
>
> I've set the PIGDIR, HADOOPDIR, and PIG_CLASSPATH variables; my
> conf/pig.properties file is completely commented out.
>
> The Hadoop namenode is running on port 9000, as indicated in the  
> above message.
>
> I imagine this is some configuration issue, possibly related to the
> fact that our NameNode and JobTracker are running on non-standard
> ports.
>
> I am not sure what logs I need to be looking at to figure out what's
> going on (and where those logs are) -- can anyone suggest the next
> steps?
>
> Thanks a lot,
> -Dmitriy