You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Aurora Skarra-Gallagher <au...@yahoo-inc.com> on 2010/01/27 00:43:05 UTC

PFPGrowth - not able to pass hadoop any parameters

Hi,

I'm using the PFPGrowth code (http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it works fine on my local box. However, when I try to get it to run on our grid cluster, it amazingly does not allow any parameters to be passed to Hadoop. When I look at the code (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java), I see that there is no way to pass custom configuration parameters (like -Dmapred.job.queue.name=X or -libjars or any other parameter for that matter).

I am shocked that it would be done this way. To get this to work, I need to go change the actual PFPGrowth.java file, add my conf.set("key", "val") lines, and recompile. Is there any other way to do this? Why would it be written in such a way that all hadoop parameters are disallowed?

Thanks,
Aurora

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Jake Mannix <ja...@gmail.com>.

Yeah, this was one of my thoughts with MAHOUT-185 - turn some of our Driver
classes to just fire off a Tool.  It is very convenient to be able to do
this, and
it's becoming more standard as well.

I need to dig up my stuff in decomposer/contrib-hadoop and pull that in and
integrate
it with Drew's patch on that ticket.

  -jake

On Tue, Jan 26, 2010 at 7:19 PM, Robin Anil <ro...@gmail.com> wrote:

> Mahout algorithms are not using ToolRunner of Hadoop. I guess many core
> hadoop-ers like that feature. I think we should be supporting that feature
> by 0.3
>
>
> Robin
>
> On Wed, Jan 27, 2010 at 5:59 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > These look like Hadoop params, to the hadoop command? why wouldn't
> > hadoop be parsing those, or, why would the Job command have to shuttle
> > them to Hadoop? I thought these were typically set in the config .xml
> > files anyhow.
> >
> > On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
> > <au...@yahoo-inc.com> wrote:
> > > Hi,
> > >
> > > I'm using the PFPGrowth code (
> > http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it
> > works fine on my local box. However, when I try to get it to run on our
> grid
> > cluster, it amazingly does not allow any parameters to be passed to
> Hadoop.
> > When I look at the code
> >
> (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java),
> > I see that there is no way to pass custom configuration parameters (like
> -
> > Dmapred.job.queue.name=X or -libjars or any other parameter for that
> > matter).
> > >
> > > I am shocked that it would be done this way. To get this to work, I
> need
> > to go change the actual PFPGrowth.java file, add my conf.set("key",
> "val")
> > lines, and recompile. Is there any other way to do this? Why would it be
> > written in such a way that all hadoop parameters are disallowed?
> > >
> > > Thanks,
> > > Aurora
> > >
> >
>
>
>
> --
> ------
> Robin Anil
> Blog: http://techdigger.wordpress.com
> -------
> Try out Swipeball for iPhone
> Video: http://www.youtube.com/watch?v=3hvEbWHciwU
> iTunes: http://itunes.com/apps/swipeball
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Jake Mannix <ja...@gmail.com>.

Yeah, this was one of my thoughts with MAHOUT-185 - turn some of our Driver
classes to just fire off a Tool.  It is very convenient to be able to do
this, and
it's becoming more standard as well.

I need to dig up my stuff in decomposer/contrib-hadoop and pull that in and
integrate
it with Drew's patch on that ticket.

  -jake

On Tue, Jan 26, 2010 at 7:19 PM, Robin Anil <ro...@gmail.com> wrote:

> Mahout algorithms are not using ToolRunner of Hadoop. I guess many core
> hadoop-ers like that feature. I think we should be supporting that feature
> by 0.3
>
>
> Robin
>
> On Wed, Jan 27, 2010 at 5:59 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > These look like Hadoop params, to the hadoop command? why wouldn't
> > hadoop be parsing those, or, why would the Job command have to shuttle
> > them to Hadoop? I thought these were typically set in the config .xml
> > files anyhow.
> >
> > On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
> > <au...@yahoo-inc.com> wrote:
> > > Hi,
> > >
> > > I'm using the PFPGrowth code (
> > http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it
> > works fine on my local box. However, when I try to get it to run on our
> grid
> > cluster, it amazingly does not allow any parameters to be passed to
> Hadoop.
> > When I look at the code
> >
> (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java),
> > I see that there is no way to pass custom configuration parameters (like
> -
> > Dmapred.job.queue.name=X or -libjars or any other parameter for that
> > matter).
> > >
> > > I am shocked that it would be done this way. To get this to work, I
> need
> > to go change the actual PFPGrowth.java file, add my conf.set("key",
> "val")
> > lines, and recompile. Is there any other way to do this? Why would it be
> > written in such a way that all hadoop parameters are disallowed?
> > >
> > > Thanks,
> > > Aurora
> > >
> >
>
>
>
> --
> ------
> Robin Anil
> Blog: http://techdigger.wordpress.com
> -------
> Try out Swipeball for iPhone
> Video: http://www.youtube.com/watch?v=3hvEbWHciwU
> iTunes: http://itunes.com/apps/swipeball
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Robin Anil <ro...@gmail.com>.

Mahout algorithms are not using ToolRunner of Hadoop. I guess many core
hadoop-ers like that feature. I think we should be supporting that feature
by 0.3


Robin

On Wed, Jan 27, 2010 at 5:59 AM, Sean Owen <sr...@gmail.com> wrote:

> These look like Hadoop params, to the hadoop command? why wouldn't
> hadoop be parsing those, or, why would the Job command have to shuttle
> them to Hadoop? I thought these were typically set in the config .xml
> files anyhow.
>
> On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
> <au...@yahoo-inc.com> wrote:
> > Hi,
> >
> > I'm using the PFPGrowth code (
> http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it
> works fine on my local box. However, when I try to get it to run on our grid
> cluster, it amazingly does not allow any parameters to be passed to Hadoop.
> When I look at the code
> (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java),
> I see that there is no way to pass custom configuration parameters (like -
> Dmapred.job.queue.name=X or -libjars or any other parameter for that
> matter).
> >
> > I am shocked that it would be done this way. To get this to work, I need
> to go change the actual PFPGrowth.java file, add my conf.set("key", "val")
> lines, and recompile. Is there any other way to do this? Why would it be
> written in such a way that all hadoop parameters are disallowed?
> >
> > Thanks,
> > Aurora
> >
>



-- 
------
Robin Anil
Blog: http://techdigger.wordpress.com
-------
Try out Swipeball for iPhone
Video: http://www.youtube.com/watch?v=3hvEbWHciwU
iTunes: http://itunes.com/apps/swipeball

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Robin Anil <ro...@gmail.com>.

Mahout algorithms are not using ToolRunner of Hadoop. I guess many core
hadoop-ers like that feature. I think we should be supporting that feature
by 0.3


Robin

On Wed, Jan 27, 2010 at 5:59 AM, Sean Owen <sr...@gmail.com> wrote:

> These look like Hadoop params, to the hadoop command? why wouldn't
> hadoop be parsing those, or, why would the Job command have to shuttle
> them to Hadoop? I thought these were typically set in the config .xml
> files anyhow.
>
> On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
> <au...@yahoo-inc.com> wrote:
> > Hi,
> >
> > I'm using the PFPGrowth code (
> http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it
> works fine on my local box. However, when I try to get it to run on our grid
> cluster, it amazingly does not allow any parameters to be passed to Hadoop.
> When I look at the code
> (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java),
> I see that there is no way to pass custom configuration parameters (like -
> Dmapred.job.queue.name=X or -libjars or any other parameter for that
> matter).
> >
> > I am shocked that it would be done this way. To get this to work, I need
> to go change the actual PFPGrowth.java file, add my conf.set("key", "val")
> lines, and recompile. Is there any other way to do this? Why would it be
> written in such a way that all hadoop parameters are disallowed?
> >
> > Thanks,
> > Aurora
> >
>



-- 
------
Robin Anil
Blog: http://techdigger.wordpress.com
-------
Try out Swipeball for iPhone
Video: http://www.youtube.com/watch?v=3hvEbWHciwU
iTunes: http://itunes.com/apps/swipeball

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Jake Mannix <ja...@gmail.com>.

I wouldn't say that they're being *hijacked* - the class just doesn't
implement Tool, so it
doesn't let them become hadoop parameters to be passed into the
Configuration.

I agree that we should, because while it's not an API requirement that Tool
/ ToolRunner
be used, it's pretty standard at this point, for exactly the reasons you're
mentioning.

  -jake

On Wed, Jan 27, 2010 at 10:26 AM, Aurora Skarra-Gallagher <
aurora@yahoo-inc.com> wrote:

> These are hadoop parameters, and the OptionBuilder used in the PFPGrowth
> examples hijacks them.
>
> -Aurora
>
>
> On 1/26/10 4:29 PM, "Sean Owen" <sr...@gmail.com> wrote:
>
> These look like Hadoop params, to the hadoop command? why wouldn't
> hadoop be parsing those, or, why would the Job command have to shuttle
> them to Hadoop? I thought these were typically set in the config .xml
> files anyhow.
>
> On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
> <au...@yahoo-inc.com> wrote:
> > Hi,
> >
> > I'm using the PFPGrowth code (
> http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it
> works fine on my local box. However, when I try to get it to run on our grid
> cluster, it amazingly does not allow any parameters to be passed to Hadoop.
> When I look at the code
> (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java),
> I see that there is no way to pass custom configuration parameters (like -
> Dmapred.job.queue.name=X or -libjars or any other parameter for that
> matter).
> >
> > I am shocked that it would be done this way. To get this to work, I need
> to go change the actual PFPGrowth.java file, add my conf.set("key", "val")
> lines, and recompile. Is there any other way to do this? Why would it be
> written in such a way that all hadoop parameters are disallowed?
> >
> > Thanks,
> > Aurora
> >
>
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Robin Anil <ro...@gmail.com>.

Actually hadoop parameters could be overridden using the ToolRunner which
parses the commandline and puts the hadoop conf in the Configuration
object.
So you can set thing like scheduler queue, mappers/system, mapred child jvm
options directly from the command line. Or we can set them in the cluster
configuration itself. We havent gotten around to using ToolRunner yet. I
guess its high time we started doing that(not much of a work i think). It
will become easier to run and tune mahout on any cluster by using the
command line flags

Robin

On Thu, Jan 28, 2010 at 12:49 AM, Sean Owen <sr...@gmail.com> wrote:

> Someone clear up my misunderstanding... that's a JVM or hadoop
> parameter no? why would it go to the Job at all (i.e. this goes before
> the class name of the Job right?)
>
> On Wed, Jan 27, 2010 at 6:26 PM, Aurora Skarra-Gallagher
> <au...@yahoo-inc.com> wrote:
> > These are hadoop parameters, and the OptionBuilder used in the PFPGrowth
> examples hijacks them.
> >
>

-- 
------
Robin Anil
Blog: http://techdigger.wordpress.com
-------
Try out Swipeball for iPhone
http://itunes.com/apps/swipeball
Mahout in Action - Early Access
http://www.manning.com/owen

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Sean Owen <sr...@gmail.com>.

Someone clear up my misunderstanding... that's a JVM or hadoop
parameter no? why would it go to the Job at all (i.e. this goes before
the class name of the Job right?)

On Wed, Jan 27, 2010 at 6:26 PM, Aurora Skarra-Gallagher
<au...@yahoo-inc.com> wrote:
> These are hadoop parameters, and the OptionBuilder used in the PFPGrowth examples hijacks them.
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Jake Mannix <ja...@gmail.com>.

I wouldn't say that they're being *hijacked* - the class just doesn't
implement Tool, so it
doesn't let them become hadoop parameters to be passed into the
Configuration.

I agree that we should, because while it's not an API requirement that Tool
/ ToolRunner
be used, it's pretty standard at this point, for exactly the reasons you're
mentioning.

  -jake

On Wed, Jan 27, 2010 at 10:26 AM, Aurora Skarra-Gallagher <
aurora@yahoo-inc.com> wrote:

> These are hadoop parameters, and the OptionBuilder used in the PFPGrowth
> examples hijacks them.
>
> -Aurora
>
>
> On 1/26/10 4:29 PM, "Sean Owen" <sr...@gmail.com> wrote:
>
> These look like Hadoop params, to the hadoop command? why wouldn't
> hadoop be parsing those, or, why would the Job command have to shuttle
> them to Hadoop? I thought these were typically set in the config .xml
> files anyhow.
>
> On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
> <au...@yahoo-inc.com> wrote:
> > Hi,
> >
> > I'm using the PFPGrowth code (
> http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it
> works fine on my local box. However, when I try to get it to run on our grid
> cluster, it amazingly does not allow any parameters to be passed to Hadoop.
> When I look at the code
> (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java),
> I see that there is no way to pass custom configuration parameters (like -
> Dmapred.job.queue.name=X or -libjars or any other parameter for that
> matter).
> >
> > I am shocked that it would be done this way. To get this to work, I need
> to go change the actual PFPGrowth.java file, add my conf.set("key", "val")
> lines, and recompile. Is there any other way to do this? Why would it be
> written in such a way that all hadoop parameters are disallowed?
> >
> > Thanks,
> > Aurora
> >
>
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Sean Owen <sr...@gmail.com>.

Someone clear up my misunderstanding... that's a JVM or hadoop
parameter no? why would it go to the Job at all (i.e. this goes before
the class name of the Job right?)

On Wed, Jan 27, 2010 at 6:26 PM, Aurora Skarra-Gallagher
<au...@yahoo-inc.com> wrote:
> These are hadoop parameters, and the OptionBuilder used in the PFPGrowth examples hijacks them.
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Aurora Skarra-Gallagher <au...@yahoo-inc.com>.

These are hadoop parameters, and the OptionBuilder used in the PFPGrowth examples hijacks them.

-Aurora


On 1/26/10 4:29 PM, "Sean Owen" <sr...@gmail.com> wrote:

These look like Hadoop params, to the hadoop command? why wouldn't
hadoop be parsing those, or, why would the Job command have to shuttle
them to Hadoop? I thought these were typically set in the config .xml
files anyhow.

On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm using the PFPGrowth code (http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it works fine on my local box. However, when I try to get it to run on our grid cluster, it amazingly does not allow any parameters to be passed to Hadoop. When I look at the code (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java), I see that there is no way to pass custom configuration parameters (like -Dmapred.job.queue.name=X or -libjars or any other parameter for that matter).
>
> I am shocked that it would be done this way. To get this to work, I need to go change the actual PFPGrowth.java file, add my conf.set("key", "val") lines, and recompile. Is there any other way to do this? Why would it be written in such a way that all hadoop parameters are disallowed?
>
> Thanks,
> Aurora
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Aurora Skarra-Gallagher <au...@yahoo-inc.com>.

These are hadoop parameters, and the OptionBuilder used in the PFPGrowth examples hijacks them.

-Aurora


On 1/26/10 4:29 PM, "Sean Owen" <sr...@gmail.com> wrote:

These look like Hadoop params, to the hadoop command? why wouldn't
hadoop be parsing those, or, why would the Job command have to shuttle
them to Hadoop? I thought these were typically set in the config .xml
files anyhow.

On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm using the PFPGrowth code (http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it works fine on my local box. However, when I try to get it to run on our grid cluster, it amazingly does not allow any parameters to be passed to Hadoop. When I look at the code (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java), I see that there is no way to pass custom configuration parameters (like -Dmapred.job.queue.name=X or -libjars or any other parameter for that matter).
>
> I am shocked that it would be done this way. To get this to work, I need to go change the actual PFPGrowth.java file, add my conf.set("key", "val") lines, and recompile. Is there any other way to do this? Why would it be written in such a way that all hadoop parameters are disallowed?
>
> Thanks,
> Aurora
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Sean Owen <sr...@gmail.com>.

These look like Hadoop params, to the hadoop command? why wouldn't
hadoop be parsing those, or, why would the Job command have to shuttle
them to Hadoop? I thought these were typically set in the config .xml
files anyhow.

On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm using the PFPGrowth code (http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it works fine on my local box. However, when I try to get it to run on our grid cluster, it amazingly does not allow any parameters to be passed to Hadoop. When I look at the code (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java), I see that there is no way to pass custom configuration parameters (like -Dmapred.job.queue.name=X or -libjars or any other parameter for that matter).
>
> I am shocked that it would be done this way. To get this to work, I need to go change the actual PFPGrowth.java file, add my conf.set("key", "val") lines, and recompile. Is there any other way to do this? Why would it be written in such a way that all hadoop parameters are disallowed?
>
> Thanks,
> Aurora
>

Re: PFPGrowth - not able to pass hadoop any parameters

Posted by Sean Owen <sr...@gmail.com>.

These look like Hadoop params, to the hadoop command? why wouldn't
hadoop be parsing those, or, why would the Job command have to shuttle
them to Hadoop? I thought these were typically set in the config .xml
files anyhow.

On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher
<au...@yahoo-inc.com> wrote:
> Hi,
>
> I'm using the PFPGrowth code (http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it works fine on my local box. However, when I try to get it to run on our grid cluster, it amazingly does not allow any parameters to be passed to Hadoop. When I look at the code (mahout/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java), I see that there is no way to pass custom configuration parameters (like -Dmapred.job.queue.name=X or -libjars or any other parameter for that matter).
>
> I am shocked that it would be done this way. To get this to work, I need to go change the actual PFPGrowth.java file, add my conf.set("key", "val") lines, and recompile. Is there any other way to do this? Why would it be written in such a way that all hadoop parameters are disallowed?
>
> Thanks,
> Aurora
>