You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Craig Macdonald <cr...@dcs.gla.ac.uk> on 2007/11/28 20:38:09 UTC

How to HOD?

Hi all,

I've been trying to setup Pig using Hadoop on Demand. Using some 
hackery, my incantation now looks like

PATH=/users/tr.craigm/OF_tools/python/bin/:$PATH ROOT=$PWD 
scripts/pig.pl -Dlog4j.level=debug -Dhod.server=local 
-Dhod.expect.root=$PWD -Dhod.command=hod/bin/hod 
-Dhod.expect.uselatest=hodrc/released -Dyinst.cluster= 
-Dhadoop.root.logger=DEBUG,console  --cluster hodrc

(the name of my hodrc file is hodrc).

However, the HOD connection code in PigContext mystifies me. Does it 
correspond to any released version of HOD?
It seems to connect to HOD, and parse the response.

PIG-18 (https://issues.apache.org/jira/browse/PIG-18) states that Pig 
needs to be fixed to work with hod 4.
So I presume that Pig does not worth with the HOD version 
hod-open-4.tar.gz  attached to 
https://issues.apache.org/jira/browse/HADOOP-1301

However, it doesnt look like Pig works with the other version of Hod 
attached to the same JIRA issue: hod.0.2.2.tar.gz

PigContent.java looks for output from HOD in the form of lines starting:
hdfsUI:
hdfs:
mapredUI:
mapred:
hadoopConf:

I cant find any source in either versions of HOD that resemble this.  
Does anyone know if Pig will currently work with any currently openly 
available version of HOD?

Thanks in advance

Craig

Re: How to HOD?

Posted by Craig Macdonald <cr...@dcs.gla.ac.uk>.
Comments on the expect script:

At the spawning of hod
    spawn -ignore {SIGHUP} 
/users/grad/craigm/src/pig/FROMApache/hod0.2/hod.0.2.2/bin/hod -n 
[lindex $args 0 ] [lindex $args 1] [lindex $args 2] [lindex $args 3] 
[lindex $args 4] [lindex $args 5] [lindex $args 6 ] [lindex $args 7] 
[lindex $args 8] [lindex $args 9] [lindex $args 10]

If there are less than 11 arguments to the expect script, expect will 
still create the extra argv entries when calling exec(3). This confuses 
the python command line parser, as it expects there to be 0 leftover 
commandline arguments. Empty command line entries still count as 
"len(args) > 0". I fixed this by patching around line 71 of 
hod.0.2.2/hodlib/Common/cfg.py

    options, args = op.parse_args(argv[1:])
+    argsNoblanks = []
+    for a in args:
+      if len(a) > 0:
+        argsnoblanks.append(a)
-  if len(args) > 0
+ if len(argsNoblanks) > 0:
+      print "\nunrecognised argument(s): "
+      print argsNoblanks
      op.print_help()
      sys.exit(1)

A better solution could probably be made by fixing the expect script, 
but I tried and failed.

There's some rather odd yahoo specific bits in places:
@@ -349,9 +350,9 @@
     }
     private String fixUpDomain(String hostPort) throws 
UnknownHostException {
         String parts[] = hostPort.split(":");
-        if (parts[0].indexOf('.') == -1) {
-            parts[0] = parts[0] + ".inktomisearch.com";
-        }
+        //if (parts[0].indexOf('.') == -1) {
+        //    parts[0] = parts[0] + ".inktomisearch.com";
+        //}
         InetAddress.getByName(parts[0]);
         return parts[0] + ":" + parts[1];
     }

Also a NullPointerException occurs at
@@ -250,7 +250,7 @@
                        cmd.append('/');
             cmd.append(System.getProperty("hod.command"));
             //String cmd = System.getProperty("hod.command", 
"/home/breed/startHOD.expect");
-                       String cluster = 
System.getProperty("yinst.cluster");
+                       String cluster = 
System.getProperty("yinst.cluster"); //NPE here if property not set
                        if (cluster.length() > 0 && 
!cluster.startsWith("kryptonite")) {
                                cmd.append(" --config=");
                                
cmd.append(System.getProperty("hod.config.dir"));


Thanks

Craig


Benjamin Reed wrote:
> Ah yes, sorry about that. We had a problem with HOD not working well with 
> piped inputs and outputs, so we actually use an expect script to interface to 
> hod. (We should open an issue on this.)
>
> I'm attaching the script that we use.
>
> ben
>
> On Wednesday 28 November 2007 11:38:09 Craig Macdonald wrote:
>   
>> Hi all,
>>
>> I've been trying to setup Pig using Hadoop on Demand. Using some
>> hackery, my incantation now looks like
>>
>> PATH=/users/tr.craigm/OF_tools/python/bin/:$PATH ROOT=$PWD
>> scripts/pig.pl -Dlog4j.level=debug -Dhod.server=local
>> -Dhod.expect.root=$PWD -Dhod.command=hod/bin/hod
>> -Dhod.expect.uselatest=hodrc/released -Dyinst.cluster=
>> -Dhadoop.root.logger=DEBUG,console  --cluster hodrc
>>
>> (the name of my hodrc file is hodrc).
>>
>> However, the HOD connection code in PigContext mystifies me. Does it
>> correspond to any released version of HOD?
>> It seems to connect to HOD, and parse the response.
>>
>> PIG-18 (https://issues.apache.org/jira/browse/PIG-18) states that Pig
>> needs to be fixed to work with hod 4.
>> So I presume that Pig does not worth with the HOD version
>> hod-open-4.tar.gz  attached to
>> https://issues.apache.org/jira/browse/HADOOP-1301
>>
>> However, it doesnt look like Pig works with the other version of Hod
>> attached to the same JIRA issue: hod.0.2.2.tar.gz
>>
>> PigContent.java looks for output from HOD in the form of lines starting:
>> hdfsUI:
>> hdfs:
>> mapredUI:
>> mapred:
>> hadoopConf:
>>
>> I cant find any source in either versions of HOD that resemble this.
>> Does anyone know if Pig will currently work with any currently openly
>> available version of HOD?
>>
>> Thanks in advance
>>
>> Craig
>>     
>
>
>   


RE: How to HOD?

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Yes, that's the plan. HOD 0.4 has a completely different interface that
would not requires us to use expect. Once HOD 0.4 is released, the plan
is to upgrade Pig to work with it as the bug indicates. This should
happen in January.

Olga 

-----Original Message-----
From: Craig Macdonald [mailto:craigm@dcs.gla.ac.uk] 
Sent: Wednesday, November 28, 2007 12:04 PM
To: Benjamin Reed
Cc: pig-dev@incubator.apache.org
Subject: Re: How to HOD?

Hi Ben,

Ok, Doh moment from me. Thanks for the script, but the hint was enough
to remind me that there's a similar version already in SVN trunk.
I dont think there's no need for a separate issue, as upgrading to work
on Hod4 is already an unresolved issue. Presumably an upgraded Pig would
no longer required the expect script (though notably, I dont think hod 4
produces all the required output ;-)

Ta muchly

Craig

Benjamin Reed wrote:
> Ah yes, sorry about that. We had a problem with HOD not working well 
> with piped inputs and outputs, so we actually use an expect script to 
> interface to hod. (We should open an issue on this.)
>
> I'm attaching the script that we use.
>
> ben
>
> On Wednesday 28 November 2007 11:38:09 Craig Macdonald wrote:
>   
>> Hi all,
>>
>> I've been trying to setup Pig using Hadoop on Demand. Using some 
>> hackery, my incantation now looks like
>>
>> PATH=/users/tr.craigm/OF_tools/python/bin/:$PATH ROOT=$PWD 
>> scripts/pig.pl -Dlog4j.level=debug -Dhod.server=local 
>> -Dhod.expect.root=$PWD -Dhod.command=hod/bin/hod 
>> -Dhod.expect.uselatest=hodrc/released -Dyinst.cluster= 
>> -Dhadoop.root.logger=DEBUG,console  --cluster hodrc
>>
>> (the name of my hodrc file is hodrc).
>>
>> However, the HOD connection code in PigContext mystifies me. Does it 
>> correspond to any released version of HOD?
>> It seems to connect to HOD, and parse the response.
>>
>> PIG-18 (https://issues.apache.org/jira/browse/PIG-18) states that Pig

>> needs to be fixed to work with hod 4.
>> So I presume that Pig does not worth with the HOD version 
>> hod-open-4.tar.gz  attached to
>> https://issues.apache.org/jira/browse/HADOOP-1301
>>
>> However, it doesnt look like Pig works with the other version of Hod 
>> attached to the same JIRA issue: hod.0.2.2.tar.gz
>>
>> PigContent.java looks for output from HOD in the form of lines
starting:
>> hdfsUI:
>> hdfs:
>> mapredUI:
>> mapred:
>> hadoopConf:
>>
>> I cant find any source in either versions of HOD that resemble this.
>> Does anyone know if Pig will currently work with any currently openly

>> available version of HOD?
>>
>> Thanks in advance
>>
>> Craig
>>     
>
>
>   


Re: How to HOD?

Posted by Craig Macdonald <cr...@dcs.gla.ac.uk>.
Hi Ben,

Ok, Doh moment from me. Thanks for the script, but the hint was enough 
to remind me that there's a similar version already in SVN trunk.
I dont think there's no need for a separate issue, as upgrading to work 
on Hod4 is already an unresolved issue. Presumably an upgraded Pig  
would no longer required the expect script (though notably, I dont think 
hod 4 produces all the required output ;-)

Ta muchly

Craig

Benjamin Reed wrote:
> Ah yes, sorry about that. We had a problem with HOD not working well with 
> piped inputs and outputs, so we actually use an expect script to interface to 
> hod. (We should open an issue on this.)
>
> I'm attaching the script that we use.
>
> ben
>
> On Wednesday 28 November 2007 11:38:09 Craig Macdonald wrote:
>   
>> Hi all,
>>
>> I've been trying to setup Pig using Hadoop on Demand. Using some
>> hackery, my incantation now looks like
>>
>> PATH=/users/tr.craigm/OF_tools/python/bin/:$PATH ROOT=$PWD
>> scripts/pig.pl -Dlog4j.level=debug -Dhod.server=local
>> -Dhod.expect.root=$PWD -Dhod.command=hod/bin/hod
>> -Dhod.expect.uselatest=hodrc/released -Dyinst.cluster=
>> -Dhadoop.root.logger=DEBUG,console  --cluster hodrc
>>
>> (the name of my hodrc file is hodrc).
>>
>> However, the HOD connection code in PigContext mystifies me. Does it
>> correspond to any released version of HOD?
>> It seems to connect to HOD, and parse the response.
>>
>> PIG-18 (https://issues.apache.org/jira/browse/PIG-18) states that Pig
>> needs to be fixed to work with hod 4.
>> So I presume that Pig does not worth with the HOD version
>> hod-open-4.tar.gz  attached to
>> https://issues.apache.org/jira/browse/HADOOP-1301
>>
>> However, it doesnt look like Pig works with the other version of Hod
>> attached to the same JIRA issue: hod.0.2.2.tar.gz
>>
>> PigContent.java looks for output from HOD in the form of lines starting:
>> hdfsUI:
>> hdfs:
>> mapredUI:
>> mapred:
>> hadoopConf:
>>
>> I cant find any source in either versions of HOD that resemble this.
>> Does anyone know if Pig will currently work with any currently openly
>> available version of HOD?
>>
>> Thanks in advance
>>
>> Craig
>>     
>
>
>   


Re: How to HOD?

Posted by Benjamin Reed <br...@yahoo-inc.com>.
Ah yes, sorry about that. We had a problem with HOD not working well with 
piped inputs and outputs, so we actually use an expect script to interface to 
hod. (We should open an issue on this.)

I'm attaching the script that we use.

ben

On Wednesday 28 November 2007 11:38:09 Craig Macdonald wrote:
> Hi all,
>
> I've been trying to setup Pig using Hadoop on Demand. Using some
> hackery, my incantation now looks like
>
> PATH=/users/tr.craigm/OF_tools/python/bin/:$PATH ROOT=$PWD
> scripts/pig.pl -Dlog4j.level=debug -Dhod.server=local
> -Dhod.expect.root=$PWD -Dhod.command=hod/bin/hod
> -Dhod.expect.uselatest=hodrc/released -Dyinst.cluster=
> -Dhadoop.root.logger=DEBUG,console  --cluster hodrc
>
> (the name of my hodrc file is hodrc).
>
> However, the HOD connection code in PigContext mystifies me. Does it
> correspond to any released version of HOD?
> It seems to connect to HOD, and parse the response.
>
> PIG-18 (https://issues.apache.org/jira/browse/PIG-18) states that Pig
> needs to be fixed to work with hod 4.
> So I presume that Pig does not worth with the HOD version
> hod-open-4.tar.gz  attached to
> https://issues.apache.org/jira/browse/HADOOP-1301
>
> However, it doesnt look like Pig works with the other version of Hod
> attached to the same JIRA issue: hod.0.2.2.tar.gz
>
> PigContent.java looks for output from HOD in the form of lines starting:
> hdfsUI:
> hdfs:
> mapredUI:
> mapred:
> hadoopConf:
>
> I cant find any source in either versions of HOD that resemble this.
> Does anyone know if Pig will currently work with any currently openly
> available version of HOD?
>
> Thanks in advance
>
> Craig