You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Lance Riedel <la...@dotspots.com> on 2009/04/29 23:02:50 UTC

Pig on standalone hadoop 19.1

Hi all,
I'm having issues trying to run pig on a stand alone hadoop cluster.  
The cluster is running 19.1, but I have applied the following patch:
http://issues.apache.org/jira/browse/PIG-573

When I start pig, I get the following  (infinite loop):

2009-04-29 16:55:27,220 [main] WARN   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Failed to create HOD configuration directory - /tmp/ 
PigHod.domU-12-31-38-00-C4-31.dotspots.3204342197969613Retrying...
2009-04-29 16:55:27,267 [main] WARN   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Failed to create HOD configuration directory - /tmp/ 
PigHod.domU-12-31-38-00-C4-31.dotspots.3204342261033613Retrying...
2009-04-29 16:55:27,312 [main] WARN   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Failed to create HOD configuration directory - /tmp/ 
PigHod.domU-12-31-38-00-C4-31.dotspots.3204342307923613Retrying...
2009-04-29 16:55:27,357 [main] WARN   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Failed to create HOD configuration directory - /tmp/ 
PigHod.domU-12-31-38-00-C4-31.dotspots.3204342352914613Retrying...

More info:
I am using the hadoop19.jar from the jira in my pig/lib dir.

/////////////////Environment:
export HADOOP_HOME=/mnt/dist/app/hadoop-0.19.1
export PIGDIR=/dist/app/pig
export PIG_CLASSPATH=$HADOOP_HOME/conf/
export PIG_HADOOP_VERSION=19
export PATH=/dist/app/apache-ant-1.7.1/bin:$PATH:/mnt/dist/app/ 
hadoop-0.19.1/bin:$PIGDIR/bin


////////////////// pig.properties

# Pig configuration file. All values can be overwritten by command  
line arguments.
# see bin/pig -help

# log4jconf log4j configuration file
# log4jconf=./conf/log4j.properties

# brief logging (no timestamps)
brief=false

# clustername, name of the hadoop jobtracker. If no port is defined  
port 50020 will be used.
cluster=ec2-75-101-247-52.compute-1.amazonaws.com:54311    # added  
this later, nothing changed
fs.default.name=hdfs://ec2-75-101-247-52.compute-1.amazonaws.com: 
54310  # added this later, nothing changed
mapred.job.tracker=ec2-75-101-247-52.compute-1.amazonaws.com:54311   #  
added this later nothing changed
#debug level, INFO is default
debug=DEBUG

# a file that contains pig script
#file=

# load jarfile, colon separated
#jar=

#verbose print all log messages to screen (default to print only INFO  
and above to screen)
verbose=false

#exectype local|mapreduce, mapreduce is default
exectype=mapreduce
# hod realted properties
#ssh.gateway
#hod.expect.root
#hod.expect.uselatest
#hod.command
#hod.config.dir
#hod.param


#Do not spill temp files smaller than this size (bytes)
pig.spill.size.threshold=5000000
#EXPERIMENT: Activate garbage collection when spilling a file bigger  
than this size (bytes)
#This should help reduce the number of files being spilled.
pig.spill.gc.activation.size=40000000


######################
# Everything below this line is Yahoo specific.  Note that I've made
# (almost) no changes to the lines above to make merging in from Apache
# easier.  Any values I don't want from above I override below.
#
# This file is configured for use with HOD on the production  
clusters.  If you
# want to run pig with a static cluster you will need to remove  
everything
# below this line and set the cluster value (above) to the
# hostname and port of your job tracker.

RE: Pig on standalone hadoop 19.1

Posted by Olga Natkovich <ol...@yahoo-inc.com>.

Hi Lance,

Sorry for this long exchange. Let me make sure I understand.

You have a pig.jar that works fine on Hadoop 18. When you apply patch
for hadoop19 to the same code and run identical job (with identical
command line) against hadoop 19 cluster. You see the problem with
connecting to HOD. Is this correct?

Did you try, as Alan suggested, adding the property on the command line
just to see if it solves the issue?

The properties file that is listed below is the one you included in
pig.jar, right?

I know that Pradeep who created the patch ran it against 1-node Hadoop
19 cluster and did not see this issue so I suspect it is somehow related
to your setup.

If we can't figure out this over email, I will setup Hadoop 19 cluster
over the weekend and try it out.

Thanks, 

Olga 

> -----Original Message-----
> From: Lance Riedel [mailto:lance@dotspots.com] 
> Sent: Monday, May 04, 2009 10:10 AM
> To: pig-user@hadoop.apache.org
> Subject: Re: Pig on standalone hadoop 19.1
> 
> Hi Olga,
> I haven't had any problems on 18. Unfortunately our cluster 
> is on 19.1 for many other reasons (I don't have them all), 
> because Pig is not our primary requirement.
> 
> Pig is something we are wanting to do to do some more 
> analytics/ reporting, but it is after the fact of all of our 
> indexing, data processing needs already in place on 19.1.
> 
> Thanks,
> Lance
> On May 2, 2009, at 4:21 PM, Olga Natkovich wrote:
> 
> > Hi Lance,
> >
> > Is there a way you can use Pig with Hadoop 18 cluster? Most of our 
> > testing at this point is done on 18 so we have much more experience 
> > with that.
> >
> > Olga
> >
> >> -----Original Message-----
> >> From: Lance Riedel [mailto:lance@dotspots.com]
> >> Sent: Saturday, May 02, 2009 9:09 AM
> >> To: pig-user@hadoop.apache.org
> >> Subject: Re: Pig on standalone hadoop 19.1
> >>
> >> Hi Alan,
> >> Thanks - I had put that value in my pig.properties (see the very 
> >> bottom - below). But it didn't seem to help there- maybe I'm not 
> >> configuring the use of the properties file correctly??
> >>
> >>
> >> The command line option DOES work however, so I'm going to 
> it to my 
> >> scripts as a work around for now.
> >>
> >> Thanks,
> >> Lance
> >>
> >> On May 1, 2009, at 10:40 AM, Alan Gates wrote:
> >>
> >>> Can you try invoking pig with -Dhod.server="".  If that 
> works, you 
> >>> should be able to set that value in your pig.properties file.
> >>>
> >>> Alan.
> >>>
> >>> On Apr 29, 2009, at 2:02 PM, Lance Riedel wrote:
> >>>
> >>>> Hi all,
> >>>> I'm having issues trying to run pig on a stand alone
> >> hadoop cluster.
> >>>> The cluster is running 19.1, but I have applied the
> >> following patch:
> >>>> http://issues.apache.org/jira/browse/PIG-573
> >>>>
> >>>> When I start pig, I get the following  (infinite loop):
> >>>>
> >>>> 2009-04-29 16:55:27,220 [main] WARN 
> >>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >>>> Failed to create HOD configuration directory - /tmp/ 
> >>>> 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342197969613Retrying...
> >>>> 2009-04-29 16:55:27,267 [main] WARN 
> >>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >>>> Failed to create HOD configuration directory - /tmp/ 
> >>>> 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342261033613Retrying...
> >>>> 2009-04-29 16:55:27,312 [main] WARN 
> >>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >>>> Failed to create HOD configuration directory - /tmp/ 
> >>>> 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342307923613Retrying...
> >>>> 2009-04-29 16:55:27,357 [main] WARN 
> >>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >>>> Failed to create HOD configuration directory - /tmp/ 
> >>>> 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342352914613Retrying...
> >>>>
> >>>> More info:
> >>>> I am using the hadoop19.jar from the jira in my pig/lib dir.
> >>>>
> >>>> /////////////////Environment:
> >>>> export HADOOP_HOME=/mnt/dist/app/hadoop-0.19.1
> >>>> export PIGDIR=/dist/app/pig
> >>>> export PIG_CLASSPATH=$HADOOP_HOME/conf/ export
> >> PIG_HADOOP_VERSION=19
> >>>> export PATH=/dist/app/apache-ant-1.7.1/bin:$PATH:/mnt/dist/app/
> >>>> hadoop-0.19.1/bin:$PIGDIR/bin
> >>>>
> >>>>
> >>>> ////////////////// pig.properties
> >>>>
> >>>> # Pig configuration file. All values can be overwritten by
> >> command
> >>>> line arguments.
> >>>> # see bin/pig -help
> >>>>
> >>>> # log4jconf log4j configuration file # 
> >>>> log4jconf=./conf/log4j.properties
> >>>>
> >>>> # brief logging (no timestamps)
> >>>> brief=false
> >>>>
> >>>> # clustername, name of the hadoop jobtracker. If no port
> >> is defined
> >>>> port 50020 will be used.
> >>>> cluster=ec2-75-101-247-52.compute-1.amazonaws.com:54311
> >> # added
> >>>> this later, nothing changed
> >>>> fs.default.name=hdfs://ec2-75-101-247-52.compute-1.amazonaws.com:
> >>>> 54310  # added this later, nothing changed
> >>>> mapred.job.tracker=ec2-75-101-247-52.compute-1.amazonaws.com:
> >>>> 54311   # added this later nothing changed
> >>>> #debug level, INFO is default
> >>>> debug=DEBUG
> >>>>
> >>>> # a file that contains pig script
> >>>> #file=
> >>>>
> >>>> # load jarfile, colon separated
> >>>> #jar=
> >>>>
> >>>> #verbose print all log messages to screen (default to print only 
> >>>> INFO and above to screen) verbose=false
> >>>>
> >>>> #exectype local|mapreduce, mapreduce is default 
> exectype=mapreduce 
> >>>> # hod realted properties #ssh.gateway #hod.expect.root 
> >>>> #hod.expect.uselatest #hod.command #hod.config.dir #hod.param
> >>>>
> >>>>
> >>>> #Do not spill temp files smaller than this size (bytes) 
> >>>> pig.spill.size.threshold=5000000
> >>>> #EXPERIMENT: Activate garbage collection when spilling a file 
> >>>> bigger than this size (bytes) #This should help reduce 
> the number 
> >>>> of files being spilled.
> >>>> pig.spill.gc.activation.size=40000000
> >>>>
> >>>>
> >>>> ######################
> >>>> # Everything below this line is Yahoo specific.  Note that
> >> I've made
> >>>> # (almost) no changes to the lines above to make merging in from 
> >>>> Apache # easier.  Any values I don't want from above I override 
> >>>> below.
> >>>> #
> >>>> # This file is configured for use with HOD on the production 
> >>>> clusters.  If you # want to run pig with a static 
> cluster you will 
> >>>> need to remove everything # below this line and set the cluster 
> >>>> value (above) to the # hostname and port of your job tracker.
> >>>>
> >>>
> >>
> >>
> 
>

Re: Pig on standalone hadoop 19.1

Posted by Lance Riedel <la...@dotspots.com>.

Hi Olga,
I haven't had any problems on 18. Unfortunately our cluster is on 19.1  
for many other reasons (I don't have them all), because Pig is not our  
primary requirement.

Pig is something we are wanting to do to do some more analytics/ 
reporting, but it is after the fact of all of our indexing, data  
processing needs already in place on 19.1.

Thanks,
Lance
On May 2, 2009, at 4:21 PM, Olga Natkovich wrote:

> Hi Lance,
>
> Is there a way you can use Pig with Hadoop 18 cluster? Most of our
> testing at this point is done on 18 so we have much more experience  
> with
> that.
>
> Olga
>
>> -----Original Message-----
>> From: Lance Riedel [mailto:lance@dotspots.com]
>> Sent: Saturday, May 02, 2009 9:09 AM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: Pig on standalone hadoop 19.1
>>
>> Hi Alan,
>> Thanks - I had put that value in my pig.properties (see the
>> very bottom - below). But it didn't seem to help there- maybe
>> I'm not configuring the use of the properties file correctly??
>>
>>
>> The command line option DOES work however, so I'm going to it
>> to my scripts as a work around for now.
>>
>> Thanks,
>> Lance
>>
>> On May 1, 2009, at 10:40 AM, Alan Gates wrote:
>>
>>> Can you try invoking pig with -Dhod.server="".  If that works, you
>>> should be able to set that value in your pig.properties file.
>>>
>>> Alan.
>>>
>>> On Apr 29, 2009, at 2:02 PM, Lance Riedel wrote:
>>>
>>>> Hi all,
>>>> I'm having issues trying to run pig on a stand alone
>> hadoop cluster.
>>>> The cluster is running 19.1, but I have applied the
>> following patch:
>>>> http://issues.apache.org/jira/browse/PIG-573
>>>>
>>>> When I start pig, I get the following  (infinite loop):
>>>>
>>>> 2009-04-29 16:55:27,220 [main] WARN
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>> Failed to create HOD configuration directory - /tmp/
>>>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342197969613Retrying...
>>>> 2009-04-29 16:55:27,267 [main] WARN
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>> Failed to create HOD configuration directory - /tmp/
>>>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342261033613Retrying...
>>>> 2009-04-29 16:55:27,312 [main] WARN
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>> Failed to create HOD configuration directory - /tmp/
>>>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342307923613Retrying...
>>>> 2009-04-29 16:55:27,357 [main] WARN
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>> Failed to create HOD configuration directory - /tmp/
>>>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342352914613Retrying...
>>>>
>>>> More info:
>>>> I am using the hadoop19.jar from the jira in my pig/lib dir.
>>>>
>>>> /////////////////Environment:
>>>> export HADOOP_HOME=/mnt/dist/app/hadoop-0.19.1
>>>> export PIGDIR=/dist/app/pig
>>>> export PIG_CLASSPATH=$HADOOP_HOME/conf/ export
>> PIG_HADOOP_VERSION=19
>>>> export PATH=/dist/app/apache-ant-1.7.1/bin:$PATH:/mnt/dist/app/
>>>> hadoop-0.19.1/bin:$PIGDIR/bin
>>>>
>>>>
>>>> ////////////////// pig.properties
>>>>
>>>> # Pig configuration file. All values can be overwritten by
>> command
>>>> line arguments.
>>>> # see bin/pig -help
>>>>
>>>> # log4jconf log4j configuration file
>>>> # log4jconf=./conf/log4j.properties
>>>>
>>>> # brief logging (no timestamps)
>>>> brief=false
>>>>
>>>> # clustername, name of the hadoop jobtracker. If no port
>> is defined
>>>> port 50020 will be used.
>>>> cluster=ec2-75-101-247-52.compute-1.amazonaws.com:54311
>> # added
>>>> this later, nothing changed
>>>> fs.default.name=hdfs://ec2-75-101-247-52.compute-1.amazonaws.com:
>>>> 54310  # added this later, nothing changed
>>>> mapred.job.tracker=ec2-75-101-247-52.compute-1.amazonaws.com:
>>>> 54311   # added this later nothing changed
>>>> #debug level, INFO is default
>>>> debug=DEBUG
>>>>
>>>> # a file that contains pig script
>>>> #file=
>>>>
>>>> # load jarfile, colon separated
>>>> #jar=
>>>>
>>>> #verbose print all log messages to screen (default to print only
>>>> INFO and above to screen)
>>>> verbose=false
>>>>
>>>> #exectype local|mapreduce, mapreduce is default
>>>> exectype=mapreduce
>>>> # hod realted properties
>>>> #ssh.gateway
>>>> #hod.expect.root
>>>> #hod.expect.uselatest
>>>> #hod.command
>>>> #hod.config.dir
>>>> #hod.param
>>>>
>>>>
>>>> #Do not spill temp files smaller than this size (bytes)
>>>> pig.spill.size.threshold=5000000
>>>> #EXPERIMENT: Activate garbage collection when spilling a file
>>>> bigger than this size (bytes)
>>>> #This should help reduce the number of files being spilled.
>>>> pig.spill.gc.activation.size=40000000
>>>>
>>>>
>>>> ######################
>>>> # Everything below this line is Yahoo specific.  Note that
>> I've made
>>>> # (almost) no changes to the lines above to make merging in from
>>>> Apache
>>>> # easier.  Any values I don't want from above I override below.
>>>> #
>>>> # This file is configured for use with HOD on the production
>>>> clusters.  If you
>>>> # want to run pig with a static cluster you will need to remove
>>>> everything
>>>> # below this line and set the cluster value (above) to the
>>>> # hostname and port of your job tracker.
>>>>
>>>
>>
>>

RE: Pig on standalone hadoop 19.1

Posted by Olga Natkovich <ol...@yahoo-inc.com>.

Hi Lance,

Is there a way you can use Pig with Hadoop 18 cluster? Most of our
testing at this point is done on 18 so we have much more experience with
that.

Olga 

> -----Original Message-----
> From: Lance Riedel [mailto:lance@dotspots.com] 
> Sent: Saturday, May 02, 2009 9:09 AM
> To: pig-user@hadoop.apache.org
> Subject: Re: Pig on standalone hadoop 19.1
> 
> Hi Alan,
> Thanks - I had put that value in my pig.properties (see the 
> very bottom - below). But it didn't seem to help there- maybe 
> I'm not configuring the use of the properties file correctly??
> 
> 
> The command line option DOES work however, so I'm going to it 
> to my scripts as a work around for now.
> 
> Thanks,
> Lance
> 
> On May 1, 2009, at 10:40 AM, Alan Gates wrote:
> 
> > Can you try invoking pig with -Dhod.server="".  If that works, you 
> > should be able to set that value in your pig.properties file.
> >
> > Alan.
> >
> > On Apr 29, 2009, at 2:02 PM, Lance Riedel wrote:
> >
> >> Hi all,
> >> I'm having issues trying to run pig on a stand alone 
> hadoop cluster. 
> >> The cluster is running 19.1, but I have applied the 
> following patch:
> >> http://issues.apache.org/jira/browse/PIG-573
> >>
> >> When I start pig, I get the following  (infinite loop):
> >>
> >> 2009-04-29 16:55:27,220 [main] WARN   
> >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >> Failed to create HOD configuration directory - /tmp/ 
> >> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342197969613Retrying...
> >> 2009-04-29 16:55:27,267 [main] WARN   
> >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >> Failed to create HOD configuration directory - /tmp/ 
> >> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342261033613Retrying...
> >> 2009-04-29 16:55:27,312 [main] WARN   
> >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >> Failed to create HOD configuration directory - /tmp/ 
> >> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342307923613Retrying...
> >> 2009-04-29 16:55:27,357 [main] WARN   
> >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >> Failed to create HOD configuration directory - /tmp/ 
> >> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342352914613Retrying...
> >>
> >> More info:
> >> I am using the hadoop19.jar from the jira in my pig/lib dir.
> >>
> >> /////////////////Environment:
> >> export HADOOP_HOME=/mnt/dist/app/hadoop-0.19.1
> >> export PIGDIR=/dist/app/pig
> >> export PIG_CLASSPATH=$HADOOP_HOME/conf/ export 
> PIG_HADOOP_VERSION=19 
> >> export PATH=/dist/app/apache-ant-1.7.1/bin:$PATH:/mnt/dist/app/
> >> hadoop-0.19.1/bin:$PIGDIR/bin
> >>
> >>
> >> ////////////////// pig.properties
> >>
> >> # Pig configuration file. All values can be overwritten by 
> command  
> >> line arguments.
> >> # see bin/pig -help
> >>
> >> # log4jconf log4j configuration file
> >> # log4jconf=./conf/log4j.properties
> >>
> >> # brief logging (no timestamps)
> >> brief=false
> >>
> >> # clustername, name of the hadoop jobtracker. If no port 
> is defined  
> >> port 50020 will be used.
> >> cluster=ec2-75-101-247-52.compute-1.amazonaws.com:54311    
> # added  
> >> this later, nothing changed
> >> fs.default.name=hdfs://ec2-75-101-247-52.compute-1.amazonaws.com: 
> >> 54310  # added this later, nothing changed
> >> mapred.job.tracker=ec2-75-101-247-52.compute-1.amazonaws.com: 
> >> 54311   # added this later nothing changed
> >> #debug level, INFO is default
> >> debug=DEBUG
> >>
> >> # a file that contains pig script
> >> #file=
> >>
> >> # load jarfile, colon separated
> >> #jar=
> >>
> >> #verbose print all log messages to screen (default to print only  
> >> INFO and above to screen)
> >> verbose=false
> >>
> >> #exectype local|mapreduce, mapreduce is default
> >> exectype=mapreduce
> >> # hod realted properties
> >> #ssh.gateway
> >> #hod.expect.root
> >> #hod.expect.uselatest
> >> #hod.command
> >> #hod.config.dir
> >> #hod.param
> >>
> >>
> >> #Do not spill temp files smaller than this size (bytes)
> >> pig.spill.size.threshold=5000000
> >> #EXPERIMENT: Activate garbage collection when spilling a file  
> >> bigger than this size (bytes)
> >> #This should help reduce the number of files being spilled.
> >> pig.spill.gc.activation.size=40000000
> >>
> >>
> >> ######################
> >> # Everything below this line is Yahoo specific.  Note that 
> I've made
> >> # (almost) no changes to the lines above to make merging in from  
> >> Apache
> >> # easier.  Any values I don't want from above I override below.
> >> #
> >> # This file is configured for use with HOD on the production  
> >> clusters.  If you
> >> # want to run pig with a static cluster you will need to remove  
> >> everything
> >> # below this line and set the cluster value (above) to the
> >> # hostname and port of your job tracker.
> >>
> >
> 
>

Re: Pig on standalone hadoop 19.1

Posted by Lance Riedel <la...@dotspots.com>.

Hi Alan,
Thanks - I had put that value in my pig.properties (see the very  
bottom - below). But it didn't seem to help there- maybe I'm not  
configuring the use of the properties file correctly??


The command line option DOES work however, so I'm going to it to my  
scripts as a work around for now.

Thanks,
Lance

On May 1, 2009, at 10:40 AM, Alan Gates wrote:

> Can you try invoking pig with -Dhod.server="".  If that works, you  
> should be able to set that value in your pig.properties file.
>
> Alan.
>
> On Apr 29, 2009, at 2:02 PM, Lance Riedel wrote:
>
>> Hi all,
>> I'm having issues trying to run pig on a stand alone hadoop  
>> cluster. The cluster is running 19.1, but I have applied the  
>> following patch:
>> http://issues.apache.org/jira/browse/PIG-573
>>
>> When I start pig, I get the following  (infinite loop):
>>
>> 2009-04-29 16:55:27,220 [main] WARN   
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
>> Failed to create HOD configuration directory - /tmp/ 
>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342197969613Retrying...
>> 2009-04-29 16:55:27,267 [main] WARN   
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
>> Failed to create HOD configuration directory - /tmp/ 
>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342261033613Retrying...
>> 2009-04-29 16:55:27,312 [main] WARN   
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
>> Failed to create HOD configuration directory - /tmp/ 
>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342307923613Retrying...
>> 2009-04-29 16:55:27,357 [main] WARN   
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
>> Failed to create HOD configuration directory - /tmp/ 
>> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342352914613Retrying...
>>
>> More info:
>> I am using the hadoop19.jar from the jira in my pig/lib dir.
>>
>> /////////////////Environment:
>> export HADOOP_HOME=/mnt/dist/app/hadoop-0.19.1
>> export PIGDIR=/dist/app/pig
>> export PIG_CLASSPATH=$HADOOP_HOME/conf/
>> export PIG_HADOOP_VERSION=19
>> export PATH=/dist/app/apache-ant-1.7.1/bin:$PATH:/mnt/dist/app/ 
>> hadoop-0.19.1/bin:$PIGDIR/bin
>>
>>
>> ////////////////// pig.properties
>>
>> # Pig configuration file. All values can be overwritten by command  
>> line arguments.
>> # see bin/pig -help
>>
>> # log4jconf log4j configuration file
>> # log4jconf=./conf/log4j.properties
>>
>> # brief logging (no timestamps)
>> brief=false
>>
>> # clustername, name of the hadoop jobtracker. If no port is defined  
>> port 50020 will be used.
>> cluster=ec2-75-101-247-52.compute-1.amazonaws.com:54311    # added  
>> this later, nothing changed
>> fs.default.name=hdfs://ec2-75-101-247-52.compute-1.amazonaws.com: 
>> 54310  # added this later, nothing changed
>> mapred.job.tracker=ec2-75-101-247-52.compute-1.amazonaws.com: 
>> 54311   # added this later nothing changed
>> #debug level, INFO is default
>> debug=DEBUG
>>
>> # a file that contains pig script
>> #file=
>>
>> # load jarfile, colon separated
>> #jar=
>>
>> #verbose print all log messages to screen (default to print only  
>> INFO and above to screen)
>> verbose=false
>>
>> #exectype local|mapreduce, mapreduce is default
>> exectype=mapreduce
>> # hod realted properties
>> #ssh.gateway
>> #hod.expect.root
>> #hod.expect.uselatest
>> #hod.command
>> #hod.config.dir
>> #hod.param
>>
>>
>> #Do not spill temp files smaller than this size (bytes)
>> pig.spill.size.threshold=5000000
>> #EXPERIMENT: Activate garbage collection when spilling a file  
>> bigger than this size (bytes)
>> #This should help reduce the number of files being spilled.
>> pig.spill.gc.activation.size=40000000
>>
>>
>> ######################
>> # Everything below this line is Yahoo specific.  Note that I've made
>> # (almost) no changes to the lines above to make merging in from  
>> Apache
>> # easier.  Any values I don't want from above I override below.
>> #
>> # This file is configured for use with HOD on the production  
>> clusters.  If you
>> # want to run pig with a static cluster you will need to remove  
>> everything
>> # below this line and set the cluster value (above) to the
>> # hostname and port of your job tracker.
>>
>

Re: Pig on standalone hadoop 19.1

Posted by Alan Gates <ga...@yahoo-inc.com>.

Can you try invoking pig with -Dhod.server="".  If that works, you  
should be able to set that value in your pig.properties file.

Alan.

On Apr 29, 2009, at 2:02 PM, Lance Riedel wrote:

> Hi all,
> I'm having issues trying to run pig on a stand alone hadoop cluster.  
> The cluster is running 19.1, but I have applied the following patch:
> http://issues.apache.org/jira/browse/PIG-573
>
> When I start pig, I get the following  (infinite loop):
>
> 2009-04-29 16:55:27,220 [main] WARN   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Failed to create HOD configuration directory - /tmp/ 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342197969613Retrying...
> 2009-04-29 16:55:27,267 [main] WARN   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Failed to create HOD configuration directory - /tmp/ 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342261033613Retrying...
> 2009-04-29 16:55:27,312 [main] WARN   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Failed to create HOD configuration directory - /tmp/ 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342307923613Retrying...
> 2009-04-29 16:55:27,357 [main] WARN   
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
> Failed to create HOD configuration directory - /tmp/ 
> PigHod.domU-12-31-38-00-C4-31.dotspots.3204342352914613Retrying...
>
> More info:
> I am using the hadoop19.jar from the jira in my pig/lib dir.
>
> /////////////////Environment:
> export HADOOP_HOME=/mnt/dist/app/hadoop-0.19.1
> export PIGDIR=/dist/app/pig
> export PIG_CLASSPATH=$HADOOP_HOME/conf/
> export PIG_HADOOP_VERSION=19
> export PATH=/dist/app/apache-ant-1.7.1/bin:$PATH:/mnt/dist/app/ 
> hadoop-0.19.1/bin:$PIGDIR/bin
>
>
> ////////////////// pig.properties
>
> # Pig configuration file. All values can be overwritten by command  
> line arguments.
> # see bin/pig -help
>
> # log4jconf log4j configuration file
> # log4jconf=./conf/log4j.properties
>
> # brief logging (no timestamps)
> brief=false
>
> # clustername, name of the hadoop jobtracker. If no port is defined  
> port 50020 will be used.
> cluster=ec2-75-101-247-52.compute-1.amazonaws.com:54311    # added  
> this later, nothing changed
> fs.default.name=hdfs://ec2-75-101-247-52.compute-1.amazonaws.com: 
> 54310  # added this later, nothing changed
> mapred.job.tracker=ec2-75-101-247-52.compute-1.amazonaws.com:54311    
> # added this later nothing changed
> #debug level, INFO is default
> debug=DEBUG
>
> # a file that contains pig script
> #file=
>
> # load jarfile, colon separated
> #jar=
>
> #verbose print all log messages to screen (default to print only  
> INFO and above to screen)
> verbose=false
>
> #exectype local|mapreduce, mapreduce is default
> exectype=mapreduce
> # hod realted properties
> #ssh.gateway
> #hod.expect.root
> #hod.expect.uselatest
> #hod.command
> #hod.config.dir
> #hod.param
>
>
> #Do not spill temp files smaller than this size (bytes)
> pig.spill.size.threshold=5000000
> #EXPERIMENT: Activate garbage collection when spilling a file bigger  
> than this size (bytes)
> #This should help reduce the number of files being spilled.
> pig.spill.gc.activation.size=40000000
>
>
> ######################
> # Everything below this line is Yahoo specific.  Note that I've made
> # (almost) no changes to the lines above to make merging in from  
> Apache
> # easier.  Any values I don't want from above I override below.
> #
> # This file is configured for use with HOD on the production  
> clusters.  If you
> # want to run pig with a static cluster you will need to remove  
> everything
> # below this line and set the cluster value (above) to the
> # hostname and port of your job tracker.
>