You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Adil Aijaz <ad...@yahoo-inc.com> on 2009/09/15 00:19:11 UTC

Plans For Migrating Drivers to Tool interface?

Hi folks,

I just recently merged my vendor branch of Mahout with Mahout trunk and 
found that Mahout now supports Hadoop 0.20. Now, with Hadoop 0.20, we 
now have the ability to use capacity scheduler instead of hod. There are 
two ways to pass on the capacity scheduler queue name to a Mahout driver 
class like KMeansDriver:

1. Have KMeansDriver extend 'Configured' and implement 'Tool' interface 
to allow command line specification of the scheduler queue name as in 
-Dmapred.job.queue.name=myqueuename
2. Add jobConfi.set() while setting up the drivers.

Personally, I prefer the first solution. Are there any plans on updating 
the various driver classes to support such capacity scheduler queues? 
Either way, I can help out in the process.

Adil

Re: Plans For Migrating Drivers to Tool interface?

Posted by Ted Dunning <te...@gmail.com>.
The first solution is much preferable for 0.20 users.

What happens to 0.19 users, though?  I understand that putting in extra
config variables isn't a problem, but how far back does Tool go?

Feel free to throw up a Jira and a patch.  I am sure that it would be looked
at quickly.

On Mon, Sep 14, 2009 at 3:19 PM, Adil Aijaz <ad...@yahoo-inc.com> wrote:

>
> Personally, I prefer the first solution. Are there any plans on updating
> the various driver classes to support such capacity scheduler queues? Either
> way, I can help out in the process.




-- 
Ted Dunning, CTO
DeepDyve

Re: Plans For Migrating Drivers to Tool interface?

Posted by Sean Owen <sr...@gmail.com>.
That looks a lot like an arg that sets a system property in the JVM.
And it sounds like you're passing it to the Java program rather than
Java. Just put it before the class name? that is... "java -D...
org.apache... foo bar baz"

On Wed, Oct 28, 2009 at 1:10 AM, Gregory Lawrence <gr...@yahoo-inc.com> wrote:
> Hi,
>
> I'm having trouble running the KMeansDriver and I suspect that the problem is related to Adil's message. I'm in an environment which recently switched to Hadoop 0.2. I am no longer able to use hod as a scheduler. Furthermore, I'm forced to specify the queue (which unfortunately is not named default). This is normally done using -Dmapred.job.queue.name. Is there any way that I will be able to use Mahout, specifically the clustering code? When I run the KmeansDriver code with the -D option, it gives the following error message:
>
> 09/10/28 01:09:21 ERROR kmeans.KMeansDriver: Exception
> org.apache.commons.cli2.OptionException: Unexpected -D while processing Options
>
> On 9/14/09 3:19 PM, "Adil Aijaz" <ad...@yahoo-inc.com> wrote:
>
> Hi folks,
>
> I just recently merged my vendor branch of Mahout with Mahout trunk and
> found that Mahout now supports Hadoop 0.20. Now, with Hadoop 0.20, we
> now have the ability to use capacity scheduler instead of hod. There are
> two ways to pass on the capacity scheduler queue name to a Mahout driver
> class like KMeansDriver:
>
> 1. Have KMeansDriver extend 'Configured' and implement 'Tool' interface
> to allow command line specification of the scheduler queue name as in
> -Dmapred.job.queue.name=myqueuename
> 2. Add jobConfi.set() while setting up the drivers.
>
> Personally, I prefer the first solution. Are there any plans on updating
> the various driver classes to support such capacity scheduler queues?
> Either way, I can help out in the process.
>
> Adil
>
>

Re: Plans For Migrating Drivers to Tool interface?

Posted by Gregory Lawrence <gr...@yahoo-inc.com>.
Hi,

I'm having trouble running the KMeansDriver and I suspect that the problem is related to Adil's message. I'm in an environment which recently switched to Hadoop 0.2. I am no longer able to use hod as a scheduler. Furthermore, I'm forced to specify the queue (which unfortunately is not named default). This is normally done using -Dmapred.job.queue.name. Is there any way that I will be able to use Mahout, specifically the clustering code? When I run the KmeansDriver code with the -D option, it gives the following error message:

09/10/28 01:09:21 ERROR kmeans.KMeansDriver: Exception
org.apache.commons.cli2.OptionException: Unexpected -D while processing Options

On 9/14/09 3:19 PM, "Adil Aijaz" <ad...@yahoo-inc.com> wrote:

Hi folks,

I just recently merged my vendor branch of Mahout with Mahout trunk and
found that Mahout now supports Hadoop 0.20. Now, with Hadoop 0.20, we
now have the ability to use capacity scheduler instead of hod. There are
two ways to pass on the capacity scheduler queue name to a Mahout driver
class like KMeansDriver:

1. Have KMeansDriver extend 'Configured' and implement 'Tool' interface
to allow command line specification of the scheduler queue name as in
-Dmapred.job.queue.name=myqueuename
2. Add jobConfi.set() while setting up the drivers.

Personally, I prefer the first solution. Are there any plans on updating
the various driver classes to support such capacity scheduler queues?
Either way, I can help out in the process.

Adil