You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Palumbo <ap...@outlook.com> on 2014/09/11 18:22:08 UTC

setting spark config parameters for shell

Does anybody know of an easy way to set the config parameters for the mahout spark-shell?  

I need to adjust: spark.kryoserializer.buffer.mb

I've been diging through the spark docs but not having much luck.

 		 	   		  

RE: setting spark config parameters for shell

Posted by Andrew Palumbo <ap...@outlook.com>.
I think this might actually be a spark bug possibly related to:

https://issues.apache.org/jira/browse/SPARK-2678

Even though we don't use spark-submit to start the mahout shell,  it seems that the CLI options are being dropped somewhere in

SparkILoop.process(args)

I get the same error:

bad option: '--driver-java-options=-Dspark.kryoserializer.buffer.mb=200'

running: $ mahout spark-shell --driver-java-options="-Dspark.kryoserializer.buffer.mb=200"

and: $SPARK_HOME/bin/spark-shell --driver-java-options="-Dspark.kryoserializer.buffer.mb=200"











> From: ap.dev@outlook.com
> To: dev@mahout.apache.org
> Subject: RE: setting spark config parameters for shell
> Date: Thu, 11 Sep 2014 13:09:21 -0400
> 
> I'm just using it to test out some of the changes that I've made to NB at the math-scala level.  It's great to test these abstract things out with.  
> 
> I'm gonna look through the mahout script- i seem to remember that's around where i stopped looking a couple months ago.
> 
> 
> > Date: Thu, 11 Sep 2014 09:58:53 -0700
> > Subject: Re: setting spark config parameters for shell
> > From: dlieu.7@gmail.com
> > To: dev@mahout.apache.org
> > 
> > yeah these things need to be tweaked for a particular application. Truth
> > is, i have not yet used the shell for anything formiddable yet. For me at
> > this point it is just a fine concept. I've been doing embedded spark use
> > (at which point one of course has a full control over SparkConf stuff).
> > 
> > On Thu, Sep 11, 2014 at 9:55 AM, Andrew Palumbo <ap...@outlook.com> wrote:
> > 
> > > thanks.  I was looking into this before a while back but got sidetracked
> > > and am just coming back to it.   But I do remember thinking that the
> > > arguments may have been dropped by /bin/mahout spark-shell
> > >
> > > i tried: $/bin/mahout spark-shell -Dspark.kryoserializer.buffer.mb=200
> > >
> > > but its not showing up in propertires the environment tab of
> > > localhost:/8080 -> "Mahout Spark Shell"
> > >
> > > I'll look back into /bin/Mahout to see if there's a problem there.
> > >
> > > I'm getting the following error:
> > >
> > > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
> > > required: 1274
> > >
> > > after doing some work on a Drm containing the output of seq2sparse from
> > > the 20newsgroups example.
> > >
> > > it seems to be failing at .collect
> > >
> > >
> > >
> > >
> > > > Date: Thu, 11 Sep 2014 09:39:32 -0700
> > > > Subject: Re: setting spark config parameters for shell
> > > > From: dlieu.7@gmail.com
> > > > To: dev@mahout.apache.org
> > > >
> > > > I remember i had a good answer for these type of things in context of the
> > > > shell, but have forgotten the answer... bummer: )
> > > >
> > > > In spark, you can just pass them in with -Dname=value. May need tweaking
> > > > bin/mahout script though. that's what i dont remember.
> > > >
> > > > I thought we were setting a reasonable default though..
> > > >
> > > >
> > > > On Thu, Sep 11, 2014 at 9:22 AM, Andrew Palumbo <ap...@outlook.com>
> > > wrote:
> > > >
> > > > > Does anybody know of an easy way to set the config parameters for the
> > > > > mahout spark-shell?
> > > > >
> > > > > I need to adjust: spark.kryoserializer.buffer.mb
> > > > >
> > > > > I've been diging through the spark docs but not having much luck.
> > > > >
> > > > >
> > >
> > >
>  		 	   		  
 		 	   		  

RE: setting spark config parameters for shell

Posted by Andrew Palumbo <ap...@outlook.com>.
I'm just using it to test out some of the changes that I've made to NB at the math-scala level.  It's great to test these abstract things out with.  

I'm gonna look through the mahout script- i seem to remember that's around where i stopped looking a couple months ago.


> Date: Thu, 11 Sep 2014 09:58:53 -0700
> Subject: Re: setting spark config parameters for shell
> From: dlieu.7@gmail.com
> To: dev@mahout.apache.org
> 
> yeah these things need to be tweaked for a particular application. Truth
> is, i have not yet used the shell for anything formiddable yet. For me at
> this point it is just a fine concept. I've been doing embedded spark use
> (at which point one of course has a full control over SparkConf stuff).
> 
> On Thu, Sep 11, 2014 at 9:55 AM, Andrew Palumbo <ap...@outlook.com> wrote:
> 
> > thanks.  I was looking into this before a while back but got sidetracked
> > and am just coming back to it.   But I do remember thinking that the
> > arguments may have been dropped by /bin/mahout spark-shell
> >
> > i tried: $/bin/mahout spark-shell -Dspark.kryoserializer.buffer.mb=200
> >
> > but its not showing up in propertires the environment tab of
> > localhost:/8080 -> "Mahout Spark Shell"
> >
> > I'll look back into /bin/Mahout to see if there's a problem there.
> >
> > I'm getting the following error:
> >
> > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
> > required: 1274
> >
> > after doing some work on a Drm containing the output of seq2sparse from
> > the 20newsgroups example.
> >
> > it seems to be failing at .collect
> >
> >
> >
> >
> > > Date: Thu, 11 Sep 2014 09:39:32 -0700
> > > Subject: Re: setting spark config parameters for shell
> > > From: dlieu.7@gmail.com
> > > To: dev@mahout.apache.org
> > >
> > > I remember i had a good answer for these type of things in context of the
> > > shell, but have forgotten the answer... bummer: )
> > >
> > > In spark, you can just pass them in with -Dname=value. May need tweaking
> > > bin/mahout script though. that's what i dont remember.
> > >
> > > I thought we were setting a reasonable default though..
> > >
> > >
> > > On Thu, Sep 11, 2014 at 9:22 AM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> > >
> > > > Does anybody know of an easy way to set the config parameters for the
> > > > mahout spark-shell?
> > > >
> > > > I need to adjust: spark.kryoserializer.buffer.mb
> > > >
> > > > I've been diging through the spark docs but not having much luck.
> > > >
> > > >
> >
> >
 		 	   		  

Re: setting spark config parameters for shell

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
yeah these things need to be tweaked for a particular application. Truth
is, i have not yet used the shell for anything formiddable yet. For me at
this point it is just a fine concept. I've been doing embedded spark use
(at which point one of course has a full control over SparkConf stuff).

On Thu, Sep 11, 2014 at 9:55 AM, Andrew Palumbo <ap...@outlook.com> wrote:

> thanks.  I was looking into this before a while back but got sidetracked
> and am just coming back to it.   But I do remember thinking that the
> arguments may have been dropped by /bin/mahout spark-shell
>
> i tried: $/bin/mahout spark-shell -Dspark.kryoserializer.buffer.mb=200
>
> but its not showing up in propertires the environment tab of
> localhost:/8080 -> "Mahout Spark Shell"
>
> I'll look back into /bin/Mahout to see if there's a problem there.
>
> I'm getting the following error:
>
> com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
> required: 1274
>
> after doing some work on a Drm containing the output of seq2sparse from
> the 20newsgroups example.
>
> it seems to be failing at .collect
>
>
>
>
> > Date: Thu, 11 Sep 2014 09:39:32 -0700
> > Subject: Re: setting spark config parameters for shell
> > From: dlieu.7@gmail.com
> > To: dev@mahout.apache.org
> >
> > I remember i had a good answer for these type of things in context of the
> > shell, but have forgotten the answer... bummer: )
> >
> > In spark, you can just pass them in with -Dname=value. May need tweaking
> > bin/mahout script though. that's what i dont remember.
> >
> > I thought we were setting a reasonable default though..
> >
> >
> > On Thu, Sep 11, 2014 at 9:22 AM, Andrew Palumbo <ap...@outlook.com>
> wrote:
> >
> > > Does anybody know of an easy way to set the config parameters for the
> > > mahout spark-shell?
> > >
> > > I need to adjust: spark.kryoserializer.buffer.mb
> > >
> > > I've been diging through the spark docs but not having much luck.
> > >
> > >
>
>

RE: setting spark config parameters for shell

Posted by Andrew Palumbo <ap...@outlook.com>.
thanks.  I was looking into this before a while back but got sidetracked and am just coming back to it.   But I do remember thinking that the arguments may have been dropped by /bin/mahout spark-shell 

i tried: $/bin/mahout spark-shell -Dspark.kryoserializer.buffer.mb=200

but its not showing up in propertires the environment tab of localhost:/8080 -> "Mahout Spark Shell"

I'll look back into /bin/Mahout to see if there's a problem there.  

I'm getting the following error: 

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 1274 

after doing some work on a Drm containing the output of seq2sparse from the 20newsgroups example.

it seems to be failing at .collect




> Date: Thu, 11 Sep 2014 09:39:32 -0700
> Subject: Re: setting spark config parameters for shell
> From: dlieu.7@gmail.com
> To: dev@mahout.apache.org
> 
> I remember i had a good answer for these type of things in context of the
> shell, but have forgotten the answer... bummer: )
> 
> In spark, you can just pass them in with -Dname=value. May need tweaking
> bin/mahout script though. that's what i dont remember.
> 
> I thought we were setting a reasonable default though..
> 
> 
> On Thu, Sep 11, 2014 at 9:22 AM, Andrew Palumbo <ap...@outlook.com> wrote:
> 
> > Does anybody know of an easy way to set the config parameters for the
> > mahout spark-shell?
> >
> > I need to adjust: spark.kryoserializer.buffer.mb
> >
> > I've been diging through the spark docs but not having much luck.
> >
> >
 		 	   		  

Re: setting spark config parameters for shell

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I remember i had a good answer for these type of things in context of the
shell, but have forgotten the answer... bummer: )

In spark, you can just pass them in with -Dname=value. May need tweaking
bin/mahout script though. that's what i dont remember.

I thought we were setting a reasonable default though..


On Thu, Sep 11, 2014 at 9:22 AM, Andrew Palumbo <ap...@outlook.com> wrote:

> Does anybody know of an easy way to set the config parameters for the
> mahout spark-shell?
>
> I need to adjust: spark.kryoserializer.buffer.mb
>
> I've been diging through the spark docs but not having much luck.
>
>