You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mario Rodriguez <ma...@gmail.com> on 2013/08/31 19:56:59 UTC

MAHOUT_OPTS not taking effect when running mahout locally

Hi everyone,

It seems MAHOUT_OPTS is not getting picked up when running mahout locally
(MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
MAHOUT_OPTS is passed in bin/mahout from:

exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS "$@"

to:

exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
$MAHOUT_OPTS


I cant guarantee it wont break some other way of running it; it does not
look like it will, but I have not tested it.

Cheers,

Mario

Re: MAHOUT_OPTS not taking effect when running mahout locally

Posted by Mario Rodriguez <ma...@gmail.com>.
Hi Harsh,

Yes, I agree with you that there needs to be a way to pass args to the jvm.
 As for app args, technically, it isn't necessary to have another env-var
for them, since users can just define them on their own scripts.  However,
adding the env-var you suggest, MAHOUT_APP_OPTS, would help clear some
confusion, IMO, as right now, what is set in bin/mahout for MAHOUT_OPTS
misleads users into thinking that those settings will actually be used.


On Wed, Sep 4, 2013 at 1:23 AM, Harsh J <ha...@cloudera.com> wrote:

> Here's what am trying to say: In most of the other projects, such as
> Hadoop, Pig, Sqoop, Flume, etc., the PROJECT_OPTS is used to specify
> "Additional JVM arguments" rather than application arguments. It has
> been the same in Mahout too, so MAHOUT_OPTS wasn't ever intended to be
> a way to pass application options/configs to the runtime, but rather
> to control heap space/system properties/etc..
>
> The change you're proposing moves it AFTER the class invocation, which
> would break other uses relying on its right use today, so instead you
> could introduce a new env-var MAHOUT_APP_OPTS which goes after the
> classname and can accept all that -D generic conf params.
>
> On Sun, Sep 1, 2013 at 4:06 AM, Mario Rodriguez <ma...@gmail.com>
> wrote:
> > What I'm passing in MAHOUT_OPTS are parameters of the same nature of
> those
> > being set in bin/mahout:
> >
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.dir=$MAHOUT_LOG_DIR"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.file=$MAHOUT_LOGFILE"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=512MB"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.child.java.opts=-Xmx4096m"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.child.java.opts=-Xmx4096m"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.output.compress=true"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.compress.map.output=true"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=1"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=1"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.factor=30"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.mb=1024"
> > MAHOUT_OPTS="$MAHOUT_OPTS -Dio.file.buffer.size=32786"
> >
> >
> > I have a beefy dev box, and so can afford to tune those values.
> >
> > In the current exec call, those parameters are not considered in the
> tasks
> > being launched by org.apache.mahout.driver.MahoutDriver.
> >
> > I can look at this in more detail when Im back in the office on monday
> and
> > submit a JIRA ticket and patch (depending on how involved the right fix
> > turns out to be).
> >
> > Cheers,
> >
> > Mario
> >
> >>
> >>
> >> On Sat, Aug 31, 2013 at 2:34 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >>> I don't quite know what its used for, but that order change can be
> >>> considered incompatible, mainly cause in its current form it is (and
> >>> doubles up) applying directly to the JVM that launches Mahout, but the
> >>> changed form makes it into application-only arguments.
> >>>
> >>> On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
> >>> > Hi Mario,
> >>> >
> >>> > Could you create a JIRA ticket for that, and submit your diff as a
> >>> patch if
> >>> > possible?
> >>> > http://issues.apache.org/jira/browse/MAHOUT
> >>> >
> >>> > Best,
> >>> > Gokhan
> >>> >
> >>> >
> >>> > On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <
> >>> mario.rodmag@gmail.com>wrote:
> >>> >
> >>> >> Hi everyone,
> >>> >>
> >>> >> It seems MAHOUT_OPTS is not getting picked up when running mahout
> >>> locally
> >>> >> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in
> which
> >>> >> MAHOUT_OPTS is passed in bin/mahout from:
> >>> >>
> >>> >> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH"
> $CLASS
> >>> >> "$@"
> >>> >>
> >>> >> to:
> >>> >>
> >>> >> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
> >>> >> $MAHOUT_OPTS
> >>> >>
> >>> >>
> >>> >> I cant guarantee it wont break some other way of running it; it does
> >>> not
> >>> >> look like it will, but I have not tested it.
> >>> >>
> >>> >> Cheers,
> >>> >>
> >>> >> Mario
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >>
> >>
>
>
>
> --
> Harsh J
>

Re: MAHOUT_OPTS not taking effect when running mahout locally

Posted by Harsh J <ha...@cloudera.com>.
Here's what am trying to say: In most of the other projects, such as
Hadoop, Pig, Sqoop, Flume, etc., the PROJECT_OPTS is used to specify
"Additional JVM arguments" rather than application arguments. It has
been the same in Mahout too, so MAHOUT_OPTS wasn't ever intended to be
a way to pass application options/configs to the runtime, but rather
to control heap space/system properties/etc..

The change you're proposing moves it AFTER the class invocation, which
would break other uses relying on its right use today, so instead you
could introduce a new env-var MAHOUT_APP_OPTS which goes after the
classname and can accept all that -D generic conf params.

On Sun, Sep 1, 2013 at 4:06 AM, Mario Rodriguez <ma...@gmail.com> wrote:
> What I'm passing in MAHOUT_OPTS are parameters of the same nature of those
> being set in bin/mahout:
>
> MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.dir=$MAHOUT_LOG_DIR"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.file=$MAHOUT_LOGFILE"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=512MB"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.child.java.opts=-Xmx4096m"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.child.java.opts=-Xmx4096m"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.output.compress=true"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.compress.map.output=true"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=1"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=1"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.factor=30"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.mb=1024"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dio.file.buffer.size=32786"
>
>
> I have a beefy dev box, and so can afford to tune those values.
>
> In the current exec call, those parameters are not considered in the tasks
> being launched by org.apache.mahout.driver.MahoutDriver.
>
> I can look at this in more detail when Im back in the office on monday and
> submit a JIRA ticket and patch (depending on how involved the right fix
> turns out to be).
>
> Cheers,
>
> Mario
>
>>
>>
>> On Sat, Aug 31, 2013 at 2:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> I don't quite know what its used for, but that order change can be
>>> considered incompatible, mainly cause in its current form it is (and
>>> doubles up) applying directly to the JVM that launches Mahout, but the
>>> changed form makes it into application-only arguments.
>>>
>>> On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
>>> > Hi Mario,
>>> >
>>> > Could you create a JIRA ticket for that, and submit your diff as a
>>> patch if
>>> > possible?
>>> > http://issues.apache.org/jira/browse/MAHOUT
>>> >
>>> > Best,
>>> > Gokhan
>>> >
>>> >
>>> > On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <
>>> mario.rodmag@gmail.com>wrote:
>>> >
>>> >> Hi everyone,
>>> >>
>>> >> It seems MAHOUT_OPTS is not getting picked up when running mahout
>>> locally
>>> >> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
>>> >> MAHOUT_OPTS is passed in bin/mahout from:
>>> >>
>>> >> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
>>> >> "$@"
>>> >>
>>> >> to:
>>> >>
>>> >> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
>>> >> $MAHOUT_OPTS
>>> >>
>>> >>
>>> >> I cant guarantee it wont break some other way of running it; it does
>>> not
>>> >> look like it will, but I have not tested it.
>>> >>
>>> >> Cheers,
>>> >>
>>> >> Mario
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>



-- 
Harsh J

Re: MAHOUT_OPTS not taking effect when running mahout locally

Posted by Mario Rodriguez <ma...@gmail.com>.
What I'm passing in MAHOUT_OPTS are parameters of the same nature of those
being set in bin/mahout:

MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.dir=$MAHOUT_LOG_DIR"
MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.file=$MAHOUT_LOGFILE"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=512MB"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.child.java.opts=-Xmx4096m"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.child.java.opts=-Xmx4096m"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.output.compress=true"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.compress.map.output=true"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=1"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=1"
MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.factor=30"
MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.mb=1024"
MAHOUT_OPTS="$MAHOUT_OPTS -Dio.file.buffer.size=32786"


I have a beefy dev box, and so can afford to tune those values.

In the current exec call, those parameters are not considered in the tasks
being launched by org.apache.mahout.driver.MahoutDriver.

I can look at this in more detail when Im back in the office on monday and
submit a JIRA ticket and patch (depending on how involved the right fix
turns out to be).

Cheers,

Mario

>
>
> On Sat, Aug 31, 2013 at 2:34 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> I don't quite know what its used for, but that order change can be
>> considered incompatible, mainly cause in its current form it is (and
>> doubles up) applying directly to the JVM that launches Mahout, but the
>> changed form makes it into application-only arguments.
>>
>> On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
>> > Hi Mario,
>> >
>> > Could you create a JIRA ticket for that, and submit your diff as a
>> patch if
>> > possible?
>> > http://issues.apache.org/jira/browse/MAHOUT
>> >
>> > Best,
>> > Gokhan
>> >
>> >
>> > On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <
>> mario.rodmag@gmail.com>wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> It seems MAHOUT_OPTS is not getting picked up when running mahout
>> locally
>> >> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
>> >> MAHOUT_OPTS is passed in bin/mahout from:
>> >>
>> >> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
>> >> "$@"
>> >>
>> >> to:
>> >>
>> >> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
>> >> $MAHOUT_OPTS
>> >>
>> >>
>> >> I cant guarantee it wont break some other way of running it; it does
>> not
>> >> look like it will, but I have not tested it.
>> >>
>> >> Cheers,
>> >>
>> >> Mario
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: MAHOUT_OPTS not taking effect when running mahout locally

Posted by Mario Rodriguez <ma...@gmail.com>.
What I'm passing in MAHOUT_OPTS


On Sat, Aug 31, 2013 at 2:34 PM, Harsh J <ha...@cloudera.com> wrote:

> I don't quite know what its used for, but that order change can be
> considered incompatible, mainly cause in its current form it is (and
> doubles up) applying directly to the JVM that launches Mahout, but the
> changed form makes it into application-only arguments.
>
> On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
> > Hi Mario,
> >
> > Could you create a JIRA ticket for that, and submit your diff as a patch
> if
> > possible?
> > http://issues.apache.org/jira/browse/MAHOUT
> >
> > Best,
> > Gokhan
> >
> >
> > On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <mario.rodmag@gmail.com
> >wrote:
> >
> >> Hi everyone,
> >>
> >> It seems MAHOUT_OPTS is not getting picked up when running mahout
> locally
> >> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
> >> MAHOUT_OPTS is passed in bin/mahout from:
> >>
> >> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
> >> "$@"
> >>
> >> to:
> >>
> >> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
> >> $MAHOUT_OPTS
> >>
> >>
> >> I cant guarantee it wont break some other way of running it; it does not
> >> look like it will, but I have not tested it.
> >>
> >> Cheers,
> >>
> >> Mario
> >>
>
>
>
> --
> Harsh J
>

Re: MAHOUT_OPTS not taking effect when running mahout locally

Posted by Harsh J <ha...@cloudera.com>.
I don't quite know what its used for, but that order change can be
considered incompatible, mainly cause in its current form it is (and
doubles up) applying directly to the JVM that launches Mahout, but the
changed form makes it into application-only arguments.

On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
> Hi Mario,
>
> Could you create a JIRA ticket for that, and submit your diff as a patch if
> possible?
> http://issues.apache.org/jira/browse/MAHOUT
>
> Best,
> Gokhan
>
>
> On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <ma...@gmail.com>wrote:
>
>> Hi everyone,
>>
>> It seems MAHOUT_OPTS is not getting picked up when running mahout locally
>> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
>> MAHOUT_OPTS is passed in bin/mahout from:
>>
>> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
>> "$@"
>>
>> to:
>>
>> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
>> $MAHOUT_OPTS
>>
>>
>> I cant guarantee it wont break some other way of running it; it does not
>> look like it will, but I have not tested it.
>>
>> Cheers,
>>
>> Mario
>>



-- 
Harsh J

Re: MAHOUT_OPTS not taking effect when running mahout locally

Posted by Gokhan Capan <gk...@gmail.com>.
Hi Mario,

Could you create a JIRA ticket for that, and submit your diff as a patch if
possible?
http://issues.apache.org/jira/browse/MAHOUT

Best,
Gokhan


On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez <ma...@gmail.com>wrote:

> Hi everyone,
>
> It seems MAHOUT_OPTS is not getting picked up when running mahout locally
> (MAHOUT_LOCAL=true).  This can be fixed by switching the order in which
> MAHOUT_OPTS is passed in bin/mahout from:
>
> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
> "$@"
>
> to:
>
> exec "$JAVA" $JAVA_HEAP_MAX  -classpath "$CLASSPATH" $CLASS  "$@"
> $MAHOUT_OPTS
>
>
> I cant guarantee it wont break some other way of running it; it does not
> look like it will, but I have not tested it.
>
> Cheers,
>
> Mario
>