You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2011/01/03 21:37:37 UTC

Re: where i can set -Dmapred.map.tasks=X

Hi Jeff,

so did you get around to fixing this? i am having this little bugger all
over the place, including book examples that don't work directly if i have
hadoop setup on my machine such as in the following:

bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
-c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
/home/dmitriy/projects/testcollections/reuters-seqfiles
Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
org.apache.commons.cli2.OptionException: Unexpected
-Dmapred.job.tracker=local while processing Options
        at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
        at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


Thanks.
-Dmitriy

On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman <je...@narus.com> wrote:

> The patch to MahoutDriver involves the code in the for loop at lines
> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
> argsList at position 1, else at the end. I will commit a patch for this
> tonight as I have not got my Narus CLA signed yet.
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
> Sent: Wednesday, December 29, 2010 11:46 AM
> To: user@mahout.apache.org
> Cc: dev@mahout.apache.org
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> ok, thank you, Jeff. Good to know. I actually expected to rely on this for
> a
> wide range of issues (most common being task jvm parameters override).
>
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Shige Takeda <st...@yahoo-inc.com>.
Jeff, I can help on this task if you don't mind.
One complexity I found was the case where one driver kicks off both MR 
and/or Sequential jobs. Although sequential one may not really need conf 
but passes conf to new FileSystem(uri, conf) to get a FileSystem but 
getConf() returns null resulting in NullPointerException.
Thanks,
-- Shige

Jeff Eastman wrote:
> Ok, this seems to be a more widespread problem. Let's identify all the places that need to be touched and I will commit them all at the same time.
>
> -----Original Message-----
> From: Shige Takeda [mailto:stakeda@yahoo-inc.com]
> Sent: Tuesday, January 04, 2011 9:03 AM
> To: user@mahout.apache.org
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> Hello,
>
> Coincidentally I came across the same problem last week and found the
> cause is Seq2Sparse's main didn't use ToolRunner.run(Tool,String[]),
> which automatically feeds -D parameters into a configuration object,
> which is accessible by Configurable.getConf().
>
> Also I see a lot of driver main functions, especially around
> clusterings, don't use TooRunner.run(Tool,String[]) but
> ToolRunner.run(Configuraiton,Too,String[]). A problem with the latter
> one is it doesn't consider the passed -D parameters.
>
> See the difference in this javadoc.
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/ToolRunner.html
>
> FYI, a specific problem to me is -Dmapred.job.queue.name=something is
> required when I run a job in the company's Hadoop cluster.
>
> Btw, any correction/suggestion to my comment is welcome as I'm also
> learning codes since last month.
>
> Thanks,
> -- Shige Takeda
>
> On 1/3/2011 8:27 PM, Jeff Eastman wrote:
>    
>> Seq2Sparse has this problem too? Not good. Users really need -D
>> abilities there. How about you JIRA your patch and I will get it in?
>>
>>
>> On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
>>      
>>> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a lot
>>> of sense there since it is a composite job and i am not sure if
>>> configuration is propagated to those. But i got it too if need be.
>>>
>>> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>    wrote:
>>>
>>>        
>>>> Resolved in mahout-574.
>>>>
>>>>
>>>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>>>
>>>>          
>>>>> Yes, it could indeed. See my previous email which shows the problem unique
>>>>> to this class.
>>>>>
>>>>>
>>>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>>>>
>>>>>            
>>>>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>>>>
>>>>>>
>>>>>>              
>
>
>    

Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Ok thanks that's the way then

On Tue, Jan 4, 2011 at 1:37 PM, Sebastian Schelter <ss...@apache.org> wrote:

> IIRC nothing more than calling ToolRunner.run(...) with the current
> configuration from within your job class is needed to propagate the
> configuration when invoking other jobs.
>
> o.a.m.cf.taste.hadoop.item.RecommenderJob which internally calls
> RowSimilarityJob had the problem a while ago.
>
> --sebastian
>
> Am 04.01.2011 22:04, schrieb Dmitriy Lyubimov:
> > Sean,
> >
> > so, is there's a comment or document on how to propagate configuration to
> > multiple jobs? or perhaps an example driver class that adheres to that?
> >
> >
> > On Tue, Jan 4, 2011 at 12:30 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> As a side point, the long-standing push to standardize on some
> >> approach for running MapReduce jobs (or groups of them), embodied in
> >> AbstractJob, would also solve this since details like this are handled
> >> already. It'd be good to move towards that model, not only because it
> >> fixes this and avoids some similar future issues, but for the sake of
> >> standardization.
> >>
> >>
> >> On Tue, Jan 4, 2011 at 12:30 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >> wrote:
> >>> Jeff, he meant that those that _don't_ use ToolRunner can't parse -D.
> >> Those
> >>> that do use, can.
> >>>
> >>> I did patch for seq2sparse. It worked reasonably well for me (in a
> >> strange
> >>> way). However, I am hesitant to offer it. The reason like i said is
> that
> >>> unlike seqdirectory job, seq2sparse uses a lot of jobs and in order to
> >> make
> >>> use of -D parameters, it needs to make sure that either every one of
> them
> >> is
> >>> launched thru a ToolRunner, or propagate obtained Configuration object
> to
> >>> them explicitly using API-ish approach. Which my patch doesn't really
> >> take
> >>> care of to a due extent, there's more work to be done to do so.
> >>>
> >>> (BTW i realize my ssvd work suffers from this too).
> >>>
> >>> -d
> >>
> >
>
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Sebastian Schelter <ss...@apache.org>.
IIRC nothing more than calling ToolRunner.run(...) with the current
configuration from within your job class is needed to propagate the
configuration when invoking other jobs.

o.a.m.cf.taste.hadoop.item.RecommenderJob which internally calls
RowSimilarityJob had the problem a while ago.

--sebastian

Am 04.01.2011 22:04, schrieb Dmitriy Lyubimov:
> Sean,
> 
> so, is there's a comment or document on how to propagate configuration to
> multiple jobs? or perhaps an example driver class that adheres to that?
> 
> 
> On Tue, Jan 4, 2011 at 12:30 PM, Sean Owen <sr...@gmail.com> wrote:
> 
>> As a side point, the long-standing push to standardize on some
>> approach for running MapReduce jobs (or groups of them), embodied in
>> AbstractJob, would also solve this since details like this are handled
>> already. It'd be good to move towards that model, not only because it
>> fixes this and avoids some similar future issues, but for the sake of
>> standardization.
>>
>>
>> On Tue, Jan 4, 2011 at 12:30 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>>> Jeff, he meant that those that _don't_ use ToolRunner can't parse -D.
>> Those
>>> that do use, can.
>>>
>>> I did patch for seq2sparse. It worked reasonably well for me (in a
>> strange
>>> way). However, I am hesitant to offer it. The reason like i said is that
>>> unlike seqdirectory job, seq2sparse uses a lot of jobs and in order to
>> make
>>> use of -D parameters, it needs to make sure that either every one of them
>> is
>>> launched thru a ToolRunner, or propagate obtained Configuration object to
>>> them explicitly using API-ish approach. Which my patch doesn't really
>> take
>>> care of to a due extent, there's more work to be done to do so.
>>>
>>> (BTW i realize my ssvd work suffers from this too).
>>>
>>> -d
>>
> 


Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Sean,

so, is there's a comment or document on how to propagate configuration to
multiple jobs? or perhaps an example driver class that adheres to that?


On Tue, Jan 4, 2011 at 12:30 PM, Sean Owen <sr...@gmail.com> wrote:

> As a side point, the long-standing push to standardize on some
> approach for running MapReduce jobs (or groups of them), embodied in
> AbstractJob, would also solve this since details like this are handled
> already. It'd be good to move towards that model, not only because it
> fixes this and avoids some similar future issues, but for the sake of
> standardization.
>
>
> On Tue, Jan 4, 2011 at 12:30 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> > Jeff, he meant that those that _don't_ use ToolRunner can't parse -D.
> Those
> > that do use, can.
> >
> > I did patch for seq2sparse. It worked reasonably well for me (in a
> strange
> > way). However, I am hesitant to offer it. The reason like i said is that
> > unlike seqdirectory job, seq2sparse uses a lot of jobs and in order to
> make
> > use of -D parameters, it needs to make sure that either every one of them
> is
> > launched thru a ToolRunner, or propagate obtained Configuration object to
> > them explicitly using API-ish approach. Which my patch doesn't really
> take
> > care of to a due extent, there's more work to be done to do so.
> >
> > (BTW i realize my ssvd work suffers from this too).
> >
> > -d
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Sean Owen <sr...@gmail.com>.
As a side point, the long-standing push to standardize on some
approach for running MapReduce jobs (or groups of them), embodied in
AbstractJob, would also solve this since details like this are handled
already. It'd be good to move towards that model, not only because it
fixes this and avoids some similar future issues, but for the sake of
standardization.


On Tue, Jan 4, 2011 at 12:30 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Jeff, he meant that those that _don't_ use ToolRunner can't parse -D. Those
> that do use, can.
>
> I did patch for seq2sparse. It worked reasonably well for me (in a strange
> way). However, I am hesitant to offer it. The reason like i said is that
> unlike seqdirectory job, seq2sparse uses a lot of jobs and in order to make
> use of -D parameters, it needs to make sure that either every one of them is
> launched thru a ToolRunner, or propagate obtained Configuration object to
> them explicitly using API-ish approach. Which my patch doesn't really take
> care of to a due extent, there's more work to be done to do so.
>
> (BTW i realize my ssvd work suffers from this too).
>
> -d

Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Jeff, he meant that those that _don't_ use ToolRunner can't parse -D. Those
that do use, can.

I did patch for seq2sparse. It worked reasonably well for me (in a strange
way). However, I am hesitant to offer it. The reason like i said is that
unlike seqdirectory job, seq2sparse uses a lot of jobs and in order to make
use of -D parameters, it needs to make sure that either every one of them is
launched thru a ToolRunner, or propagate obtained Configuration object to
them explicitly using API-ish approach. Which my patch doesn't really take
care of to a due extent, there's more work to be done to do so.

(BTW i realize my ssvd work suffers from this too).

-d



On Tue, Jan 4, 2011 at 9:43 AM, Jeff Eastman <je...@narus.com> wrote:

> It's odd though, that kmeans works correctly with multiple -D arguments,
> even though it uses the ToolRunner.run(Configuration,Tool,String[]). Are you
> sure about the semantics difference? It's not obvious from the javadocs.
>
> -----Original Message-----
> From: Jeff Eastman [mailto:jeastman@narus.com]
> Sent: Tuesday, January 04, 2011 9:09 AM
> To: user@mahout.apache.org
> Subject: RE: where i can set -Dmapred.map.tasks=X
>
> Ok, this seems to be a more widespread problem. Let's identify all the
> places that need to be touched and I will commit them all at the same time.
>
> -----Original Message-----
> From: Shige Takeda [mailto:stakeda@yahoo-inc.com]
> Sent: Tuesday, January 04, 2011 9:03 AM
> To: user@mahout.apache.org
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> Hello,
>
> Coincidentally I came across the same problem last week and found the
> cause is Seq2Sparse's main didn't use ToolRunner.run(Tool,String[]),
> which automatically feeds -D parameters into a configuration object,
> which is accessible by Configurable.getConf().
>
> Also I see a lot of driver main functions, especially around
> clusterings, don't use TooRunner.run(Tool,String[]) but
> ToolRunner.run(Configuraiton,Too,String[]). A problem with the latter
> one is it doesn't consider the passed -D parameters.
>
> See the difference in this javadoc.
>
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/ToolRunner.html
>
> FYI, a specific problem to me is -Dmapred.job.queue.name=something is
> required when I run a job in the company's Hadoop cluster.
>
> Btw, any correction/suggestion to my comment is welcome as I'm also
> learning codes since last month.
>
> Thanks,
> -- Shige Takeda
>
> On 1/3/2011 8:27 PM, Jeff Eastman wrote:
> > Seq2Sparse has this problem too? Not good. Users really need -D
> > abilities there. How about you JIRA your patch and I will get it in?
> >
> >
> > On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
> >> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a
> lot
> >> of sense there since it is a composite job and i am not sure if
> >> configuration is propagated to those. But i got it too if need be.
> >>
> >> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>
> wrote:
> >>
> >>> Resolved in mahout-574.
> >>>
> >>>
> >>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<
> jdog@windwardsolutions.com>wrote:
> >>>
> >>>> Yes, it could indeed. See my previous email which shows the problem
> unique
> >>>> to this class.
> >>>>
> >>>>
> >>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
> >>>>
> >>>>> Could it be because of SequenceFileFromDirectory is not an
> AbstractJob?
> >>>>>
> >>>>>
>
>
>

RE: where i can set -Dmapred.map.tasks=X

Posted by Jeff Eastman <je...@Narus.com>.
It's odd though, that kmeans works correctly with multiple -D arguments, even though it uses the ToolRunner.run(Configuration,Tool,String[]). Are you sure about the semantics difference? It's not obvious from the javadocs.

-----Original Message-----
From: Jeff Eastman [mailto:jeastman@narus.com] 
Sent: Tuesday, January 04, 2011 9:09 AM
To: user@mahout.apache.org
Subject: RE: where i can set -Dmapred.map.tasks=X

Ok, this seems to be a more widespread problem. Let's identify all the places that need to be touched and I will commit them all at the same time.

-----Original Message-----
From: Shige Takeda [mailto:stakeda@yahoo-inc.com] 
Sent: Tuesday, January 04, 2011 9:03 AM
To: user@mahout.apache.org
Subject: Re: where i can set -Dmapred.map.tasks=X

Hello,

Coincidentally I came across the same problem last week and found the 
cause is Seq2Sparse's main didn't use ToolRunner.run(Tool,String[]), 
which automatically feeds -D parameters into a configuration object, 
which is accessible by Configurable.getConf().

Also I see a lot of driver main functions, especially around 
clusterings, don't use TooRunner.run(Tool,String[]) but 
ToolRunner.run(Configuraiton,Too,String[]). A problem with the latter 
one is it doesn't consider the passed -D parameters.

See the difference in this javadoc.
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/ToolRunner.html

FYI, a specific problem to me is -Dmapred.job.queue.name=something is 
required when I run a job in the company's Hadoop cluster.

Btw, any correction/suggestion to my comment is welcome as I'm also 
learning codes since last month.

Thanks,
-- Shige Takeda

On 1/3/2011 8:27 PM, Jeff Eastman wrote:
> Seq2Sparse has this problem too? Not good. Users really need -D
> abilities there. How about you JIRA your patch and I will get it in?
>
>
> On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
>> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a lot
>> of sense there since it is a composite job and i am not sure if
>> configuration is propagated to those. But i got it too if need be.
>>
>> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>   wrote:
>>
>>> Resolved in mahout-574.
>>>
>>>
>>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>>
>>>> Yes, it could indeed. See my previous email which shows the problem unique
>>>> to this class.
>>>>
>>>>
>>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>>>
>>>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>>>
>>>>>



RE: where i can set -Dmapred.map.tasks=X

Posted by Jeff Eastman <je...@Narus.com>.
Ok, this seems to be a more widespread problem. Let's identify all the places that need to be touched and I will commit them all at the same time.

-----Original Message-----
From: Shige Takeda [mailto:stakeda@yahoo-inc.com] 
Sent: Tuesday, January 04, 2011 9:03 AM
To: user@mahout.apache.org
Subject: Re: where i can set -Dmapred.map.tasks=X

Hello,

Coincidentally I came across the same problem last week and found the 
cause is Seq2Sparse's main didn't use ToolRunner.run(Tool,String[]), 
which automatically feeds -D parameters into a configuration object, 
which is accessible by Configurable.getConf().

Also I see a lot of driver main functions, especially around 
clusterings, don't use TooRunner.run(Tool,String[]) but 
ToolRunner.run(Configuraiton,Too,String[]). A problem with the latter 
one is it doesn't consider the passed -D parameters.

See the difference in this javadoc.
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/ToolRunner.html

FYI, a specific problem to me is -Dmapred.job.queue.name=something is 
required when I run a job in the company's Hadoop cluster.

Btw, any correction/suggestion to my comment is welcome as I'm also 
learning codes since last month.

Thanks,
-- Shige Takeda

On 1/3/2011 8:27 PM, Jeff Eastman wrote:
> Seq2Sparse has this problem too? Not good. Users really need -D
> abilities there. How about you JIRA your patch and I will get it in?
>
>
> On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
>> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a lot
>> of sense there since it is a composite job and i am not sure if
>> configuration is propagated to those. But i got it too if need be.
>>
>> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>   wrote:
>>
>>> Resolved in mahout-574.
>>>
>>>
>>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>>
>>>> Yes, it could indeed. See my previous email which shows the problem unique
>>>> to this class.
>>>>
>>>>
>>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>>>
>>>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>>>
>>>>>



Re: where i can set -Dmapred.map.tasks=X

Posted by Shige Takeda <st...@yahoo-inc.com>.
Hello,

Coincidentally I came across the same problem last week and found the 
cause is Seq2Sparse's main didn't use ToolRunner.run(Tool,String[]), 
which automatically feeds -D parameters into a configuration object, 
which is accessible by Configurable.getConf().

Also I see a lot of driver main functions, especially around 
clusterings, don't use TooRunner.run(Tool,String[]) but 
ToolRunner.run(Configuraiton,Too,String[]). A problem with the latter 
one is it doesn't consider the passed -D parameters.

See the difference in this javadoc.
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/ToolRunner.html

FYI, a specific problem to me is -Dmapred.job.queue.name=something is 
required when I run a job in the company's Hadoop cluster.

Btw, any correction/suggestion to my comment is welcome as I'm also 
learning codes since last month.

Thanks,
-- Shige Takeda

On 1/3/2011 8:27 PM, Jeff Eastman wrote:
> Seq2Sparse has this problem too? Not good. Users really need -D
> abilities there. How about you JIRA your patch and I will get it in?
>
>
> On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
>> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a lot
>> of sense there since it is a composite job and i am not sure if
>> configuration is propagated to those. But i got it too if need be.
>>
>> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>   wrote:
>>
>>> Resolved in mahout-574.
>>>
>>>
>>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>>
>>>> Yes, it could indeed. See my previous email which shows the problem unique
>>>> to this class.
>>>>
>>>>
>>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>>>
>>>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>>>
>>>>>



Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
yes it does.  ok will do tomorrow. Thank you, Jeff.
-Dmitriy

On Mon, Jan 3, 2011 at 8:27 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> Seq2Sparse has this problem too? Not good. Users really need -D abilities
> there. How about you JIRA your patch and I will get it in?
>
>
>
> On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
>
>> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a
>> lot
>> of sense there since it is a composite job and i am not sure if
>> configuration is propagated to those. But i got it too if need be.
>>
>> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>
>>  wrote:
>>
>> Resolved in mahout-574.
>>>
>>>
>>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<jdog@windwardsolutions.com
>>> >wrote:
>>>
>>> Yes, it could indeed. See my previous email which shows the problem
>>>> unique
>>>> to this class.
>>>>
>>>>
>>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>>>
>>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>>>
>>>>>
>>>>>
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Seq2Sparse has this problem too? Not good. Users really need -D 
abilities there. How about you JIRA your patch and I will get it in?


On 1/3/11 7:43 PM, Dmitriy Lyubimov wrote:
> Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a lot
> of sense there since it is a composite job and i am not sure if
> configuration is propagated to those. But i got it too if need be.
>
> On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov<dl...@gmail.com>  wrote:
>
>> Resolved in mahout-574.
>>
>>
>> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>
>>> Yes, it could indeed. See my previous email which shows the problem unique
>>> to this class.
>>>
>>>
>>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>>
>>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>>
>>>>


Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Jeff, i also have a similar patch for seq2sparse. Not sure if it makes a lot
of sense there since it is a composite job and i am not sure if
configuration is propagated to those. But i got it too if need be.

On Mon, Jan 3, 2011 at 5:36 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Resolved in mahout-574.
>
>
> On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:
>
>> Yes, it could indeed. See my previous email which shows the problem unique
>> to this class.
>>
>>
>> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>>
>>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>>
>>>

Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Resolved in mahout-574.

On Mon, Jan 3, 2011 at 3:49 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> Yes, it could indeed. See my previous email which shows the problem unique
> to this class.
>
>
> On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
>
>> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>>
>>

Re: where i can set -Dmapred.map.tasks=X

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Yes, it could indeed. See my previous email which shows the problem 
unique to this class.

On 1/3/11 3:30 PM, Dmitriy Lyubimov wrote:
> Could it be because of SequenceFileFromDirectory is not an AbstractJob?
>
> On Mon, Jan 3, 2011 at 3:21 PM, Dmitriy Lyubimov<dl...@gmail.com>  wrote:
>
>> I printed out arguments that it supplies to hadoop program driver:
>>
>> [seqdirectory, -Dfs.default.name=file:///, -Dmapred.job.tracker=local, -c,
>> UTF-8, -o, /home/dmitriy/projects/testcollections/reuters-seqfiles, -i,
>> /home/dmitriy/projects/testcollections/reuters-extracted/]
>>
>>
>> So it seems to be doing the right thing with the ordering now but it still
>> doesn't work for some reason with this particular command line.
>>
>> -Dmitriy
>>
>>
>> On Mon, Jan 3, 2011 at 3:17 PM, Dmitriy Lyubimov<dl...@gmail.com>wrote:
>>
>>> Jeff,
>>> now it stopped complaining about first -D but started doing so about the
>>> second one.
>>>
>>>
>>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
>>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>>> 11/01/03 15:16:13 ERROR text.SequenceFilesFromDirectory: Exception
>>> org.apache.commons.cli2.OptionException: Unexpected -Dfs.default.name=file:///
>>> while processing Options
>>>
>>>          at
>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>          at
>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:201)
>>>
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>          at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>          at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>          at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>          at
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>          at
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:183)
>>>
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>          at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>          at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>          at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>>
>>>
>>> On Mon, Jan 3, 2011 at 1:04 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>>
>>>> Yes, I committed a small patch on the 29th. Try a new trunk build.
>>>>
>>>>
>>>> On 1/3/11 12:37 PM, Dmitriy Lyubimov wrote:
>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> so did you get around to fixing this? i am having this little bugger all
>>>>> over the place, including book examples that don't work directly if i
>>>>> have
>>>>> hadoop setup on my machine such as in the following:
>>>>>
>>>>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name
>>>>> =file:///
>>>>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>>>>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>>>>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>>>>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>>>>> 11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
>>>>> org.apache.commons.cli2.OptionException: Unexpected
>>>>> -Dmapred.job.tracker=local while processing Options
>>>>>          at
>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>          at
>>>>>
>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
>>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>          at
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>          at
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>          at
>>>>>
>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>>          at
>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>          at
>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>          at
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>          at
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>          at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>>>>
>>>>>
>>>>> Thanks.
>>>>> -Dmitriy
>>>>>
>>>>> On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<je...@narus.com>
>>>>>   wrote:
>>>>>
>>>>>   The patch to MahoutDriver involves the code in the for loop at lines
>>>>>> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
>>>>>> argsList at position 1, else at the end. I will commit a patch for this
>>>>>> tonight as I have not got my Narus CLA signed yet.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>>>>> Sent: Wednesday, December 29, 2010 11:46 AM
>>>>>> To: user@mahout.apache.org
>>>>>> Cc: dev@mahout.apache.org
>>>>>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>>>>>
>>>>>> ok, thank you, Jeff. Good to know. I actually expected to rely on this
>>>>>> for
>>>>>> a
>>>>>> wide range of issues (most common being task jvm parameters override).
>>>>>>
>>>>>>
>>>>>>


Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Could it be because of SequenceFileFromDirectory is not an AbstractJob?

On Mon, Jan 3, 2011 at 3:21 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> I printed out arguments that it supplies to hadoop program driver:
>
> [seqdirectory, -Dfs.default.name=file:///, -Dmapred.job.tracker=local, -c,
> UTF-8, -o, /home/dmitriy/projects/testcollections/reuters-seqfiles, -i,
> /home/dmitriy/projects/testcollections/reuters-extracted/]
>
>
> So it seems to be doing the right thing with the ordering now but it still
> doesn't work for some reason with this particular command line.
>
> -Dmitriy
>
>
> On Mon, Jan 3, 2011 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
>
>> Jeff,
>> now it stopped complaining about first -D but started doing so about the
>> second one.
>>
>>
>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>> 11/01/03 15:16:13 ERROR text.SequenceFilesFromDirectory: Exception
>> org.apache.commons.cli2.OptionException: Unexpected -Dfs.default.name=file:///
>> while processing Options
>>
>>         at
>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>         at
>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:201)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:183)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>>
>> On Mon, Jan 3, 2011 at 1:04 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:
>>
>>> Yes, I committed a small patch on the 29th. Try a new trunk build.
>>>
>>>
>>> On 1/3/11 12:37 PM, Dmitriy Lyubimov wrote:
>>>
>>>> Hi Jeff,
>>>>
>>>> so did you get around to fixing this? i am having this little bugger all
>>>> over the place, including book examples that don't work directly if i
>>>> have
>>>> hadoop setup on my machine such as in the following:
>>>>
>>>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name
>>>> =file:///
>>>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>>>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>>>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>>>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>>>> 11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
>>>> org.apache.commons.cli2.OptionException: Unexpected
>>>> -Dmapred.job.tracker=local while processing Options
>>>>         at
>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>         at
>>>>
>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>         at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at
>>>>
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>         at
>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>         at
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>         at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>>>
>>>>
>>>> Thanks.
>>>> -Dmitriy
>>>>
>>>> On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<je...@narus.com>
>>>>  wrote:
>>>>
>>>>  The patch to MahoutDriver involves the code in the for loop at lines
>>>>> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
>>>>> argsList at position 1, else at the end. I will commit a patch for this
>>>>> tonight as I have not got my Narus CLA signed yet.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>>>> Sent: Wednesday, December 29, 2010 11:46 AM
>>>>> To: user@mahout.apache.org
>>>>> Cc: dev@mahout.apache.org
>>>>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>>>>
>>>>> ok, thank you, Jeff. Good to know. I actually expected to rely on this
>>>>> for
>>>>> a
>>>>> wide range of issues (most common being task jvm parameters override).
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
It works for me with this command line:

./bin/mahout kmeans -Dmapred.reduce.tasks=10 -Dmapred.map.tasks=10 
-Dfs.default.name=file:/// -Dmapred.job.tracker=local -i foo -c bar -o 
baz -x 10

but not yours either:

./bin/mahout seqdirectory -Dmapred.job.tracker=local 
-Dfs.default.name=file:/// -c UTF-8 -i 
/home/dmitriy/projects/testcollections/reuters-extracted/ -o 
/home/dmitriy/projects/testcollections/reuters-seqfiles

This seems to be a different problem.


On 1/3/11 3:21 PM, Dmitriy Lyubimov wrote:
> I printed out arguments that it supplies to hadoop program driver:
>
> [seqdirectory, -Dfs.default.name=file:///, -Dmapred.job.tracker=local, -c,
> UTF-8, -o, /home/dmitriy/projects/testcollections/reuters-seqfiles, -i,
> /home/dmitriy/projects/testcollections/reuters-extracted/]
>
>
> So it seems to be doing the right thing with the ordering now but it still
> doesn't work for some reason with this particular command line.
>
> -Dmitriy
>
> On Mon, Jan 3, 2011 at 3:17 PM, Dmitriy Lyubimov<dl...@gmail.com>  wrote:
>
>> Jeff,
>> now it stopped complaining about first -D but started doing so about the
>> second one.
>>
>>
>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>> 11/01/03 15:16:13 ERROR text.SequenceFilesFromDirectory: Exception
>> org.apache.commons.cli2.OptionException: Unexpected -Dfs.default.name=file:///
>> while processing Options
>>
>>          at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>          at
>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:201)
>>
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>          at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>          at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>          at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>          at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>          at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:183)
>>
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>          at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>          at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>          at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>>
>> On Mon, Jan 3, 2011 at 1:04 PM, Jeff Eastman<jd...@windwardsolutions.com>wrote:
>>
>>> Yes, I committed a small patch on the 29th. Try a new trunk build.
>>>
>>>
>>> On 1/3/11 12:37 PM, Dmitriy Lyubimov wrote:
>>>
>>>> Hi Jeff,
>>>>
>>>> so did you get around to fixing this? i am having this little bugger all
>>>> over the place, including book examples that don't work directly if i
>>>> have
>>>> hadoop setup on my machine such as in the following:
>>>>
>>>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name
>>>> =file:///
>>>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>>>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>>>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>>>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>>>> 11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
>>>> org.apache.commons.cli2.OptionException: Unexpected
>>>> -Dmapred.job.tracker=local while processing Options
>>>>          at
>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>          at
>>>>
>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>          at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>          at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>>          at
>>>>
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>          at
>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>          at
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>          at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>          at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>>          at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>>>
>>>>
>>>> Thanks.
>>>> -Dmitriy
>>>>
>>>> On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<je...@narus.com>
>>>>   wrote:
>>>>
>>>>   The patch to MahoutDriver involves the code in the for loop at lines
>>>>> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
>>>>> argsList at position 1, else at the end. I will commit a patch for this
>>>>> tonight as I have not got my Narus CLA signed yet.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>>>> Sent: Wednesday, December 29, 2010 11:46 AM
>>>>> To: user@mahout.apache.org
>>>>> Cc: dev@mahout.apache.org
>>>>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>>>>
>>>>> ok, thank you, Jeff. Good to know. I actually expected to rely on this
>>>>> for
>>>>> a
>>>>> wide range of issues (most common being task jvm parameters override).
>>>>>
>>>>>
>>>>>


Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I printed out arguments that it supplies to hadoop program driver:

[seqdirectory, -Dfs.default.name=file:///, -Dmapred.job.tracker=local, -c,
UTF-8, -o, /home/dmitriy/projects/testcollections/reuters-seqfiles, -i,
/home/dmitriy/projects/testcollections/reuters-extracted/]


So it seems to be doing the right thing with the ordering now but it still
doesn't work for some reason with this particular command line.

-Dmitriy

On Mon, Jan 3, 2011 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Jeff,
> now it stopped complaining about first -D but started doing so about the
> second one.
>
>
> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
> /home/dmitriy/projects/testcollections/reuters-seqfiles
> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
> 11/01/03 15:16:13 ERROR text.SequenceFilesFromDirectory: Exception
> org.apache.commons.cli2.OptionException: Unexpected -Dfs.default.name=file:///
> while processing Options
>
>         at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>         at
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:201)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:183)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>
> On Mon, Jan 3, 2011 at 1:04 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:
>
>> Yes, I committed a small patch on the 29th. Try a new trunk build.
>>
>>
>> On 1/3/11 12:37 PM, Dmitriy Lyubimov wrote:
>>
>>> Hi Jeff,
>>>
>>> so did you get around to fixing this? i am having this little bugger all
>>> over the place, including book examples that don't work directly if i
>>> have
>>> hadoop setup on my machine such as in the following:
>>>
>>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name
>>> =file:///
>>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>>> 11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
>>> org.apache.commons.cli2.OptionException: Unexpected
>>> -Dmapred.job.tracker=local while processing Options
>>>         at
>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>         at
>>>
>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>>
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>         at
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>         at
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>>
>>>
>>> Thanks.
>>> -Dmitriy
>>>
>>> On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<je...@narus.com>
>>>  wrote:
>>>
>>>  The patch to MahoutDriver involves the code in the for loop at lines
>>>> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
>>>> argsList at position 1, else at the end. I will commit a patch for this
>>>> tonight as I have not got my Narus CLA signed yet.
>>>>
>>>> -----Original Message-----
>>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>>> Sent: Wednesday, December 29, 2010 11:46 AM
>>>> To: user@mahout.apache.org
>>>> Cc: dev@mahout.apache.org
>>>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>>>
>>>> ok, thank you, Jeff. Good to know. I actually expected to rely on this
>>>> for
>>>> a
>>>> wide range of issues (most common being task jvm parameters override).
>>>>
>>>>
>>>>
>>
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Jeff,
now it stopped complaining about first -D but started doing so about the
second one.

bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
-c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
/home/dmitriy/projects/testcollections/reuters-seqfiles
Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
11/01/03 15:16:13 ERROR text.SequenceFilesFromDirectory: Exception
org.apache.commons.cli2.OptionException: Unexpected -Dfs.default.name=file:///
while processing Options
        at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
        at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:201)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:183)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


On Mon, Jan 3, 2011 at 1:04 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> Yes, I committed a small patch on the 29th. Try a new trunk build.
>
>
> On 1/3/11 12:37 PM, Dmitriy Lyubimov wrote:
>
>> Hi Jeff,
>>
>> so did you get around to fixing this? i am having this little bugger all
>> over the place, including book examples that don't work directly if i have
>> hadoop setup on my machine such as in the following:
>>
>> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name
>> =file:///
>> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
>> /home/dmitriy/projects/testcollections/reuters-seqfiles
>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
>> 11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
>> org.apache.commons.cli2.OptionException: Unexpected
>> -Dmapred.job.tracker=local while processing Options
>>         at
>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>         at
>>
>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>>
>> Thanks.
>> -Dmitriy
>>
>> On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<je...@narus.com>
>>  wrote:
>>
>>  The patch to MahoutDriver involves the code in the for loop at lines
>>> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
>>> argsList at position 1, else at the end. I will commit a patch for this
>>> tonight as I have not got my Narus CLA signed yet.
>>>
>>> -----Original Message-----
>>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>>> Sent: Wednesday, December 29, 2010 11:46 AM
>>> To: user@mahout.apache.org
>>> Cc: dev@mahout.apache.org
>>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>>
>>> ok, thank you, Jeff. Good to know. I actually expected to rely on this
>>> for
>>> a
>>> wide range of issues (most common being task jvm parameters override).
>>>
>>>
>>>
>

Re: where i can set -Dmapred.map.tasks=X

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Yes, I committed a small patch on the 29th. Try a new trunk build.

On 1/3/11 12:37 PM, Dmitriy Lyubimov wrote:
> Hi Jeff,
>
> so did you get around to fixing this? i am having this little bugger all
> over the place, including book examples that don't work directly if i have
> hadoop setup on my machine such as in the following:
>
> bin/mahout seqdirectory -Dmapred.job.tracker=local -Dfs.default.name=file:///
> -c UTF-8 -i /home/dmitriy/projects/testcollections/reuters-extracted/ -o
> /home/dmitriy/projects/testcollections/reuters-seqfiles
> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> No HADOOP_CONF_DIR set, using /home/dmitriy/tools/hadoop/conf
> 11/01/03 12:32:06 ERROR text.SequenceFilesFromDirectory: Exception
> org.apache.commons.cli2.OptionException: Unexpected
> -Dmapred.job.tracker=local while processing Options
>          at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>          at
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:187)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>          at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>          at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>
> Thanks.
> -Dmitriy
>
> On Wed, Dec 29, 2010 at 11:50 AM, Jeff Eastman<je...@narus.com>  wrote:
>
>> The patch to MahoutDriver involves the code in the for loop at lines
>> 203-216. If the arg.startsWith("-D") then the arg needs to be added to
>> argsList at position 1, else at the end. I will commit a patch for this
>> tonight as I have not got my Narus CLA signed yet.
>>
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
>> Sent: Wednesday, December 29, 2010 11:46 AM
>> To: user@mahout.apache.org
>> Cc: dev@mahout.apache.org
>> Subject: Re: where i can set -Dmapred.map.tasks=X
>>
>> ok, thank you, Jeff. Good to know. I actually expected to rely on this for
>> a
>> wide range of issues (most common being task jvm parameters override).
>>
>>