You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spot.apache.org by Vikash Kumar <vi...@oneconvergence.com> on 2017/07/21 18:38:57 UTC

[SPOT-ML] spot-ml fails to start

Hi all,

    I am a beginner and trying to bring spot on single node. I have used
CDH 5.12 for installation.

    While trying to start spot-ml, I am getting below this error:

 spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
--driver-memory --conf spark.driver.maxResultSize= --conf
spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores=
--conf spark.executor.memory= --conf
spark.sql.autoBroadcastJoinThreshold=10485760
--conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M
-XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf
spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf
spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar
--analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/
--dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
--ldatopiccount 20 --scored
/user/cloudera-scm/flow/scored_results/19731231/scores
--threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha
--ldabeta --ldaoptimizer --precision 64
Invalid initial heap size: -Xms--conf
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.


I have machine with 16 GB memory. Is it possible to run spot in single
machine ?

What is the recommended values of below config ?

USER_DOMAIN=''

SPK_EXEC=''
SPK_EXEC_MEM=''
SPK_DRIVER_MEM=''
SPK_DRIVER_MAX_RESULTS=''
SPK_EXEC_CORES=''
SPK_DRIVER_MEM_OVERHEAD=''
SPK_EXEC_MEM_OVERHEAD=''
SPK_AUTO_BRDCST_JOIN_THR='10485760'

LDA_OPTIMIZER=''
LDA_ALPHA=''
LDA_BETA=''


Regards,
Vikash

Re: [SPOT-ML] spot-ml fails to start

Posted by "Barona, Ricardo" <ri...@intel.com>.

I tried to reproduce using CDH 5.11.1 but it did execute correctly.

Hadoop 2.6.0-cdh5.11.1
Subversion http://github.com/cloudera/hadoop -r b581c269ca3610c603b6d7d1da0d14dfb6684aa3
Compiled by jenkins on 2017-06-01T17:42Z
Compiled with protoc 2.5.0
From source with checksum c6cbc4f20a8a571dd7c9f743984da1
This command was run using /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/hadoop-common-2.6.0-cdh5.11.1.jar


@Vikash, can you please try doing `sbt assembly` in a different machine? Maybe your local machine and them move only the jar to your node? It’s just a long shot, I’m not sure if that could be the problem.

Thanks.

From: Vikash Kumar <vi...@oneconvergence.com>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Thursday, July 27, 2017 at 11:13 AM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Re: [SPOT-ML] spot-ml fails to start

I have this hadoop version installed:

hadoop version
Hadoop 2.6.0-cdh5.12.0
Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd
Compiled by jenkins on 2017-06-29T11:31Z
Compiled with protoc 2.5.0
From source with checksum 7c45ae7a4592ce5af86bc4598c5b4
This command was run using /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/hadoop-common-2.6.0-cdh5.12.0.jar

Regards,
Vikash

"Without requirements or design, programming is the art of adding bugs to an empty text file."- Louis Srygley

On Thu, Jul 27, 2017 at 9:15 PM, Vikash Kumar <vi...@oneconvergence.com>> wrote:
I updated to spark 2.1.0, but after that I am getting this exception:

spark2-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=3 --conf spark.executor.cores=1 --conf spark.executor.memory=3g --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047 --conf spark.yarn.executor.memoryOverhead=3047 target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002 --ldabeta 1.0001 --ldaoptimizer em --precision 64
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    at org.apache.spark.deploy.SparkSubmitArguments.handleUnknown(SparkSubmitArguments.scala:460)
    at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:178)
    at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:98)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 5 more

Regards,
Vikash

"Without requirements or design, programming is the art of adding bugs to an empty text file."- Louis Srygley

On Mon, Jul 24, 2017 at 10:02 PM, Vikash Kumar <vi...@oneconvergence.com>> wrote:
Thanks Ricardo,
     I will update Spark to 2.1.0  .

Regards,
Vikash

"Without requirements or design, programming is the art of adding bugs to an empty text file."- Louis Srygley

On Mon, Jul 24, 2017 at 8:30 PM, Barona, Ricardo <ri...@intel.com>> wrote:
Hi Vikash,

Yeah, you need Spark 2.1.0. Documentation is getting updated as we speak.

As for user domain, it will take only one domain, like ‘intel’ or ‘mycompany’ but not a list (for now). If you don’t have any domain don’t worry, that particular parameter can be empty.

From: Vikash Kumar <vi...@oneconvergence.com>>
Reply-To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Date: Monday, July 24, 2017 at 3:47 AM

To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Subject: Re: [SPOT-ML] spot-ml fails to start

Thanks for quick reply. I have made the changes as suggested but getting an exception here:

+ spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=3 --conf spark.executor.cores=1 --conf spark.executor.memory=3g --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.ma<http://spark.kryoserializer.buffer.ma>x=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047 --conf spark.yarn.executor.memoryOverhead=3047 target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002 --ldabeta 1.0001 --ldaoptimizer em --precision 64 --userdomain compute-3
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
    at scopt.OptionParser.parse(options.scala:370)
    at org.apache.spot.SuspiciousConnects$.main(SuspiciousConnects.scala:52)
    at org.apache.spot.SuspiciousConnects.main(SuspiciousConnects.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Also, I didn't have userdomain set for my machine, do I need to set it ? I tried setting - '', <my compute hostname>, 'intel' in /etc/spot.conf . But every time, the exception was same.
Do I need to install some other software (version) ?

Regards,
Vikash

"Without requirements or design, programming is the art of adding bugs to an empty text file."- Louis Srygley

On Sat, Jul 22, 2017 at 12:35 AM, Barona, Ricardo <ri...@intel.com>> wrote:
I’m not sure if yarn is going to run in a single node but you can try taking 9 GB out of the total 16GB.
Divide 9GB by 3, that’s going to be your memory per executor. You’re also going to have only 3 executors total.
Try using 2 cores per executor if that’ supported by your node (total number of executors) or assign only 1 core per executor.

USER_DOMAIN='<your domain>'

SPK_EXEC='3'
SPK_EXEC_MEM='3g'
SPK_DRIVER_MEM='1g'
SPK_DRIVER_MAX_RESULTS='1g'
SPK_EXEC_CORES='1'
SPK_DRIVER_MEM_OVERHEAD='102'
SPK_EXEC_MEM_OVERHEAD='307'
SPK_AUTO_BRDCST_JOIN_THR='10485760'

As per LDA use the values I sent you on slack:

LDA_OPTIMIZER='em'
LDA_ALPHA='1.002'
LDA_BETA='1.0001'

Let me know if that works.

From: Nathanael Smith <na...@apache.org>>
Reply-To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Date: Friday, July 21, 2017 at 1:56 PM
To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Subject: Re: [SPOT-ML] spot-ml fails to start

Hi Vikash, and welcome!

To answer your last question first, it’s important to know that without filling out these fields you will get the error that you are seeing.
In the error message you will see that each configuration option is missing it’s value that should be provided by the spot.conf.

to provide some values you can follow this guide:
https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md

You will need to take into account not only the 16gb of memory, but how many cores are available to your single node and work it out from there.
I hope this helps get you started,

- Nathanael



On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vi...@oneconvergence.com>> wrote:

Hi all,
    I am a beginner and trying to bring spot on single node. I have used CDH 5.12 for installation.
    While trying to start spot-ml, I am getting below this error:

 spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory --conf spark.driver.maxResultSize= --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores= --conf spark.executor.memory= --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.ma<http://spark.kryoserializer.buffer.ma/>x=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha --ldabeta --ldaoptimizer --precision 64
Invalid initial heap size: -Xms--conf
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
I have machine with 16 GB memory. Is it possible to run spot in single machine ?
What is the recommended values of below config ?

USER_DOMAIN=''

SPK_EXEC=''
SPK_EXEC_MEM=''
SPK_DRIVER_MEM=''
SPK_DRIVER_MAX_RESULTS=''
SPK_EXEC_CORES=''
SPK_DRIVER_MEM_OVERHEAD=''
SPK_EXEC_MEM_OVERHEAD=''
SPK_AUTO_BRDCST_JOIN_THR='10485760'

LDA_OPTIMIZER=''
LDA_ALPHA=''
LDA_BETA=''
Regards,
Vikash

Re: [SPOT-ML] spot-ml fails to start

Posted by Vikash Kumar <vi...@oneconvergence.com>.

I have this hadoop version installed:

hadoop version
Hadoop 2.6.0-cdh5.12.0
Subversion http://github.com/cloudera/hadoop -r
dba647c5a8bc5e09b572d76a8d29481c78d1a0dd
Compiled by jenkins on 2017-06-29T11:31Z
Compiled with protoc 2.5.0
From source with checksum 7c45ae7a4592ce5af86bc4598c5b4
This command was run using
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/hadoop-common-2.6.0-cdh5.12.0.jar


Regards,
Vikash

*"Without requirements or design, programming is the art of adding bugs to
an empty text file."- Louis Srygley*

On Thu, Jul 27, 2017 at 9:15 PM, Vikash Kumar <
vikash.kumar@oneconvergence.com> wrote:

> I updated to spark 2.1.0, but after that I am getting this exception:
>
> spark2-submit --class org.apache.spot.SuspiciousConnects --master yarn
> --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf
> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
> --conf spark.dynamicAllocation.maxExecutors=3 --conf
> spark.executor.cores=1 --conf spark.executor.memory=3g --conf spark.sql.
> autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.
> extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf
> spark.kryoserializer.buffer.max=512m --conf spark.yarn.am.waitTime=100s
> --conf spark.yarn.am.memoryOverhead=3047 --conf spark.yarn.executor.memoryOverhead=3047
> target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input
> /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000
> --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
> --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002
> --ldabeta 1.0001 --ldaoptimizer em --precision 64
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/fs/FSDataInputStream
>     at org.apache.spark.deploy.SparkSubmitArguments.handleUnknown(
> SparkSubmitArguments.scala:460)
>     at org.apache.spark.launcher.SparkSubmitOptionParser.parse(
> SparkSubmitOptionParser.java:178)
>     at org.apache.spark.deploy.SparkSubmitArguments.<init>(
> SparkSubmitArguments.scala:98)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.
> FSDataInputStream
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     ... 5 more
>
>
> Regards,
> Vikash
>
> *"Without requirements or design, programming is the art of adding bugs to
> an empty text file."- Louis Srygley*
>
> On Mon, Jul 24, 2017 at 10:02 PM, Vikash Kumar <
> vikash.kumar@oneconvergence.com> wrote:
>
>> Thanks Ricardo,
>>
>>      I will update Spark to 2.1.0  .
>>
>> Regards,
>> Vikash
>>
>> *"Without requirements or design, programming is the art of adding bugs
>> to an empty text file."- Louis Srygley*
>>
>> On Mon, Jul 24, 2017 at 8:30 PM, Barona, Ricardo <
>> ricardo.barona@intel.com> wrote:
>>
>>> Hi Vikash,
>>>
>>>
>>>
>>> Yeah, you need Spark 2.1.0. Documentation is getting updated as we
>>> speak.
>>>
>>>
>>>
>>> As for user domain, it will take only one domain, like ‘intel’ or
>>> ‘mycompany’ but not a list (for now). If you don’t have any domain don’t
>>> worry, that particular parameter can be empty.
>>>
>>>
>>>
>>> *From: *Vikash Kumar <vi...@oneconvergence.com>
>>> *Reply-To: *"user@spot.incubator.apache.org" <
>>> user@spot.incubator.apache.org>
>>> *Date: *Monday, July 24, 2017 at 3:47 AM
>>>
>>> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
>>> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>>>
>>>
>>>
>>> Thanks for quick reply. I have made the changes as suggested but getting
>>> an exception here:
>>>
>>> + spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
>>> --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf
>>> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
>>> --conf spark.dynamicAllocation.maxExecutors=3 --conf
>>> spark.executor.cores=1 --conf spark.executor.memory=3g --conf
>>> spark.sql.autoBroadcastJoinThreshold=10485760 --conf
>>> 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M
>>> -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf
>>> spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047
>>> --conf spark.yarn.executor.memoryOverhead=3047
>>> target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input
>>> /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000
>>> --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
>>> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
>>> --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002
>>> --ldabeta 1.0001 --ldaoptimizer em --precision 64 --userdomain compute-3
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
>>>     at scopt.OptionParser.parse(options.scala:370)
>>>     at org.apache.spot.SuspiciousConnects$.main(SuspiciousConnects.
>>> scala:52)
>>>     at org.apache.spot.SuspiciousConnects.main(SuspiciousConnects.scala)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:57)
>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:730)
>>>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>> .scala:181)
>>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>> a:206)
>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>>
>>> Also, I didn't have userdomain set for my machine, do I need to set it ?
>>> I tried setting - '', <my compute hostname>, 'intel' in /etc/spot.conf .
>>> But every time, the exception was same.
>>>
>>> Do I need to install some other software (version) ?
>>>
>>>
>>> Regards,
>>>
>>> Vikash
>>>
>>> *"Without requirements or design, programming is the art of adding bugs
>>> to an empty text file."- Louis Srygley*
>>>
>>>
>>>
>>> On Sat, Jul 22, 2017 at 12:35 AM, Barona, Ricardo <
>>> ricardo.barona@intel.com> wrote:
>>>
>>> I’m not sure if yarn is going to run in a single node but you can try
>>> taking 9 GB out of the total 16GB.
>>>
>>> Divide 9GB by 3, that’s going to be your memory per executor. You’re
>>> also going to have only 3 executors total.
>>>
>>> Try using 2 cores per executor if that’ supported by your node (total
>>> number of executors) or assign only 1 core per executor.
>>>
>>>
>>>
>>> USER_DOMAIN='<your domain>'
>>>
>>> SPK_EXEC='3'
>>> SPK_EXEC_MEM='3g'
>>> SPK_DRIVER_MEM='1g'
>>> SPK_DRIVER_MAX_RESULTS='1g'
>>> SPK_EXEC_CORES='1'
>>> SPK_DRIVER_MEM_OVERHEAD='102'
>>> SPK_EXEC_MEM_OVERHEAD='307'
>>> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>>>
>>>
>>>
>>> As per LDA use the values I sent you on slack:
>>>
>>>
>>>
>>> LDA_OPTIMIZER='em'
>>> LDA_ALPHA='1.002'
>>> LDA_BETA='1.0001'
>>>
>>>
>>>
>>> Let me know if that works.
>>>
>>>
>>>
>>> *From: *Nathanael Smith <na...@apache.org>
>>> *Reply-To: *"user@spot.incubator.apache.org" <
>>> user@spot.incubator.apache.org>
>>> *Date: *Friday, July 21, 2017 at 1:56 PM
>>> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
>>> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>>>
>>>
>>>
>>> Hi Vikash, and welcome!
>>>
>>>
>>>
>>> To answer your last question first, it’s important to know that without
>>> filling out these fields you will get the error that you are seeing.
>>>
>>> In the error message you will see that each configuration option is
>>> missing it’s value that should be provided by the spot.conf.
>>>
>>>
>>>
>>> to provide some values you can follow this guide:
>>>
>>> https://github.com/apache/incubator-spot/blob/master/spot-ml
>>> /SPARKCONF.md
>>>
>>>
>>>
>>> You will need to take into account not only the 16gb of memory, but how
>>> many cores are available to your single node and work it out from there.
>>>
>>> I hope this helps get you started,
>>>
>>>
>>>
>>> - Nathanael
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Jul 21, 2017, at 11:38 AM, Vikash Kumar <
>>> vikash.kumar@oneconvergence.com> wrote:
>>>
>>>
>>>
>>> Hi all,
>>>
>>>     I am a beginner and trying to bring spot on single node. I have used
>>> CDH 5.12 for installation.
>>>
>>>     While trying to start spot-ml, I am getting below this error:
>>>
>>>  spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
>>> --driver-memory --conf spark.driver.maxResultSize= --conf
>>> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
>>> --conf spark.dynamicAllocation.maxExecutors= --conf
>>> spark.executor.cores= --conf spark.executor.memory= --conf
>>> spark.sql.autoBroadcastJoinThreshold=10485760 --conf
>>> 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M
>>> -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf
>>> spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf
>>> spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar
>>> --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/
>>> --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
>>> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
>>> --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha
>>> --ldabeta --ldaoptimizer --precision 64
>>> Invalid initial heap size: -Xms--conf
>>> Error: Could not create the Java Virtual Machine.
>>> Error: A fatal exception has occurred. Program will exit.
>>>
>>> I have machine with 16 GB memory. Is it possible to run spot in single
>>> machine ?
>>>
>>> What is the recommended values of below config ?
>>>
>>> USER_DOMAIN=''
>>>
>>> SPK_EXEC=''
>>> SPK_EXEC_MEM=''
>>> SPK_DRIVER_MEM=''
>>> SPK_DRIVER_MAX_RESULTS=''
>>> SPK_EXEC_CORES=''
>>> SPK_DRIVER_MEM_OVERHEAD=''
>>> SPK_EXEC_MEM_OVERHEAD=''
>>> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>>>
>>> LDA_OPTIMIZER=''
>>> LDA_ALPHA=''
>>> LDA_BETA=''
>>>
>>> Regards,
>>>
>>> Vikash
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: [SPOT-ML] spot-ml fails to start

Posted by Vikash Kumar <vi...@oneconvergence.com>.

I updated to spark 2.1.0, but after that I am getting this exception:

spark2-submit --class org.apache.spot.SuspiciousConnects --master yarn
--driver-memory 1g --conf spark.driver.maxResultSize=1g --conf
spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=3 --conf spark.executor.cores=1
--conf spark.executor.memory=3g --conf
spark.sql.autoBroadcastJoinThreshold=10485760 --conf
'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M'
--conf spark.kryoserializer.buffer.max=512m --conf
spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047 --conf
spark.yarn.executor.memoryOverhead=3047
target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input
/user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback
/home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20
--scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold
1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002 --ldabeta
1.0001 --ldaoptimizer em --precision 64
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/fs/FSDataInputStream
    at
org.apache.spark.deploy.SparkSubmitArguments.handleUnknown(SparkSubmitArguments.scala:460)
    at
org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:178)
    at
org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:98)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 5 more


Regards,
Vikash

*"Without requirements or design, programming is the art of adding bugs to
an empty text file."- Louis Srygley*

On Mon, Jul 24, 2017 at 10:02 PM, Vikash Kumar <
vikash.kumar@oneconvergence.com> wrote:

> Thanks Ricardo,
>
>      I will update Spark to 2.1.0  .
>
> Regards,
> Vikash
>
> *"Without requirements or design, programming is the art of adding bugs to
> an empty text file."- Louis Srygley*
>
> On Mon, Jul 24, 2017 at 8:30 PM, Barona, Ricardo <ricardo.barona@intel.com
> > wrote:
>
>> Hi Vikash,
>>
>>
>>
>> Yeah, you need Spark 2.1.0. Documentation is getting updated as we speak.
>>
>>
>>
>> As for user domain, it will take only one domain, like ‘intel’ or
>> ‘mycompany’ but not a list (for now). If you don’t have any domain don’t
>> worry, that particular parameter can be empty.
>>
>>
>>
>> *From: *Vikash Kumar <vi...@oneconvergence.com>
>> *Reply-To: *"user@spot.incubator.apache.org" <
>> user@spot.incubator.apache.org>
>> *Date: *Monday, July 24, 2017 at 3:47 AM
>>
>> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
>> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>>
>>
>>
>> Thanks for quick reply. I have made the changes as suggested but getting
>> an exception here:
>>
>> + spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
>> --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf
>> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
>> --conf spark.dynamicAllocation.maxExecutors=3 --conf
>> spark.executor.cores=1 --conf spark.executor.memory=3g --conf
>> spark.sql.autoBroadcastJoinThreshold=10485760 --conf
>> 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M'
>> --conf spark.kryoserializer.buffer.max=512m --conf
>> spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047
>> --conf spark.yarn.executor.memoryOverhead=3047
>> target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input
>> /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000
>> --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
>> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
>> --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002
>> --ldabeta 1.0001 --ldaoptimizer em --precision 64 --userdomain compute-3
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
>>     at scopt.OptionParser.parse(options.scala:370)
>>     at org.apache.spot.SuspiciousConnects$.main(SuspiciousConnects.
>> scala:52)
>>     at org.apache.spot.SuspiciousConnects.main(SuspiciousConnects.scala)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:57)
>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:730)
>>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:181)
>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>>
>> Also, I didn't have userdomain set for my machine, do I need to set it ?
>> I tried setting - '', <my compute hostname>, 'intel' in /etc/spot.conf .
>> But every time, the exception was same.
>>
>> Do I need to install some other software (version) ?
>>
>>
>> Regards,
>>
>> Vikash
>>
>> *"Without requirements or design, programming is the art of adding bugs
>> to an empty text file."- Louis Srygley*
>>
>>
>>
>> On Sat, Jul 22, 2017 at 12:35 AM, Barona, Ricardo <
>> ricardo.barona@intel.com> wrote:
>>
>> I’m not sure if yarn is going to run in a single node but you can try
>> taking 9 GB out of the total 16GB.
>>
>> Divide 9GB by 3, that’s going to be your memory per executor. You’re also
>> going to have only 3 executors total.
>>
>> Try using 2 cores per executor if that’ supported by your node (total
>> number of executors) or assign only 1 core per executor.
>>
>>
>>
>> USER_DOMAIN='<your domain>'
>>
>> SPK_EXEC='3'
>> SPK_EXEC_MEM='3g'
>> SPK_DRIVER_MEM='1g'
>> SPK_DRIVER_MAX_RESULTS='1g'
>> SPK_EXEC_CORES='1'
>> SPK_DRIVER_MEM_OVERHEAD='102'
>> SPK_EXEC_MEM_OVERHEAD='307'
>> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>>
>>
>>
>> As per LDA use the values I sent you on slack:
>>
>>
>>
>> LDA_OPTIMIZER='em'
>> LDA_ALPHA='1.002'
>> LDA_BETA='1.0001'
>>
>>
>>
>> Let me know if that works.
>>
>>
>>
>> *From: *Nathanael Smith <na...@apache.org>
>> *Reply-To: *"user@spot.incubator.apache.org" <
>> user@spot.incubator.apache.org>
>> *Date: *Friday, July 21, 2017 at 1:56 PM
>> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
>> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>>
>>
>>
>> Hi Vikash, and welcome!
>>
>>
>>
>> To answer your last question first, it’s important to know that without
>> filling out these fields you will get the error that you are seeing.
>>
>> In the error message you will see that each configuration option is
>> missing it’s value that should be provided by the spot.conf.
>>
>>
>>
>> to provide some values you can follow this guide:
>>
>> https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md
>>
>>
>>
>> You will need to take into account not only the 16gb of memory, but how
>> many cores are available to your single node and work it out from there.
>>
>> I hope this helps get you started,
>>
>>
>>
>> - Nathanael
>>
>>
>>
>>
>>
>>
>>
>> On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vikash.kumar@oneconvergence.c
>> om> wrote:
>>
>>
>>
>> Hi all,
>>
>>     I am a beginner and trying to bring spot on single node. I have used
>> CDH 5.12 for installation.
>>
>>     While trying to start spot-ml, I am getting below this error:
>>
>>  spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
>> --driver-memory --conf spark.driver.maxResultSize= --conf
>> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
>> --conf spark.dynamicAllocation.maxExecutors= --conf
>> spark.executor.cores= --conf spark.executor.memory= --conf
>> spark.sql.autoBroadcastJoinThreshold=10485760 --conf
>> 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M'
>> --conf spark.kryoserializer.buffer.max=512m --conf
>> spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf
>> spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar
>> --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/
>> --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
>> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
>> --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha
>> --ldabeta --ldaoptimizer --precision 64
>> Invalid initial heap size: -Xms--conf
>> Error: Could not create the Java Virtual Machine.
>> Error: A fatal exception has occurred. Program will exit.
>>
>> I have machine with 16 GB memory. Is it possible to run spot in single
>> machine ?
>>
>> What is the recommended values of below config ?
>>
>> USER_DOMAIN=''
>>
>> SPK_EXEC=''
>> SPK_EXEC_MEM=''
>> SPK_DRIVER_MEM=''
>> SPK_DRIVER_MAX_RESULTS=''
>> SPK_EXEC_CORES=''
>> SPK_DRIVER_MEM_OVERHEAD=''
>> SPK_EXEC_MEM_OVERHEAD=''
>> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>>
>> LDA_OPTIMIZER=''
>> LDA_ALPHA=''
>> LDA_BETA=''
>>
>> Regards,
>>
>> Vikash
>>
>>
>>
>>
>>
>
>

Re: [SPOT-ML] spot-ml fails to start

Posted by Vikash Kumar <vi...@oneconvergence.com>.

Thanks Ricardo,

     I will update Spark to 2.1.0  .

Regards,
Vikash

*"Without requirements or design, programming is the art of adding bugs to
an empty text file."- Louis Srygley*

On Mon, Jul 24, 2017 at 8:30 PM, Barona, Ricardo <ri...@intel.com>
wrote:

> Hi Vikash,
>
>
>
> Yeah, you need Spark 2.1.0. Documentation is getting updated as we speak.
>
>
>
> As for user domain, it will take only one domain, like ‘intel’ or
> ‘mycompany’ but not a list (for now). If you don’t have any domain don’t
> worry, that particular parameter can be empty.
>
>
>
> *From: *Vikash Kumar <vi...@oneconvergence.com>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Monday, July 24, 2017 at 3:47 AM
>
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>
>
>
> Thanks for quick reply. I have made the changes as suggested but getting
> an exception here:
>
> + spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
> --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf
> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
> --conf spark.dynamicAllocation.maxExecutors=3 --conf
> spark.executor.cores=1 --conf spark.executor.memory=3g --conf spark.sql.
> autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.
> extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf
> spark.kryoserializer.buffer.max=512m --conf spark.yarn.am.waitTime=100s
> --conf spark.yarn.am.memoryOverhead=3047 --conf spark.yarn.executor.memoryOverhead=3047
> target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input
> /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000
> --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
> --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002
> --ldabeta 1.0001 --ldaoptimizer em --precision 64 --userdomain compute-3
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
>     at scopt.OptionParser.parse(options.scala:370)
>     at org.apache.spot.SuspiciousConnects$.main(
> SuspiciousConnects.scala:52)
>     at org.apache.spot.SuspiciousConnects.main(SuspiciousConnects.scala)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
> Also, I didn't have userdomain set for my machine, do I need to set it ? I
> tried setting - '', <my compute hostname>, 'intel' in /etc/spot.conf . But
> every time, the exception was same.
>
> Do I need to install some other software (version) ?
>
>
> Regards,
>
> Vikash
>
> *"Without requirements or design, programming is the art of adding bugs to
> an empty text file."- Louis Srygley*
>
>
>
> On Sat, Jul 22, 2017 at 12:35 AM, Barona, Ricardo <
> ricardo.barona@intel.com> wrote:
>
> I’m not sure if yarn is going to run in a single node but you can try
> taking 9 GB out of the total 16GB.
>
> Divide 9GB by 3, that’s going to be your memory per executor. You’re also
> going to have only 3 executors total.
>
> Try using 2 cores per executor if that’ supported by your node (total
> number of executors) or assign only 1 core per executor.
>
>
>
> USER_DOMAIN='<your domain>'
>
> SPK_EXEC='3'
> SPK_EXEC_MEM='3g'
> SPK_DRIVER_MEM='1g'
> SPK_DRIVER_MAX_RESULTS='1g'
> SPK_EXEC_CORES='1'
> SPK_DRIVER_MEM_OVERHEAD='102'
> SPK_EXEC_MEM_OVERHEAD='307'
> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>
>
>
> As per LDA use the values I sent you on slack:
>
>
>
> LDA_OPTIMIZER='em'
> LDA_ALPHA='1.002'
> LDA_BETA='1.0001'
>
>
>
> Let me know if that works.
>
>
>
> *From: *Nathanael Smith <na...@apache.org>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Friday, July 21, 2017 at 1:56 PM
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>
>
>
> Hi Vikash, and welcome!
>
>
>
> To answer your last question first, it’s important to know that without
> filling out these fields you will get the error that you are seeing.
>
> In the error message you will see that each configuration option is
> missing it’s value that should be provided by the spot.conf.
>
>
>
> to provide some values you can follow this guide:
>
> https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md
>
>
>
> You will need to take into account not only the 16gb of memory, but how
> many cores are available to your single node and work it out from there.
>
> I hope this helps get you started,
>
>
>
> - Nathanael
>
>
>
>
>
>
>
> On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vikash.kumar@oneconvergence.
> com> wrote:
>
>
>
> Hi all,
>
>     I am a beginner and trying to bring spot on single node. I have used
> CDH 5.12 for installation.
>
>     While trying to start spot-ml, I am getting below this error:
>
>  spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
> --driver-memory --conf spark.driver.maxResultSize= --conf
> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
> --conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores=
> --conf spark.executor.memory= --conf spark.sql.autoBroadcastJoinThreshold=10485760
> --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M
> -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf
> spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf
> spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar
> --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/
> --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
> --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha
> --ldabeta --ldaoptimizer --precision 64
> Invalid initial heap size: -Xms--conf
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
>
> I have machine with 16 GB memory. Is it possible to run spot in single
> machine ?
>
> What is the recommended values of below config ?
>
> USER_DOMAIN=''
>
> SPK_EXEC=''
> SPK_EXEC_MEM=''
> SPK_DRIVER_MEM=''
> SPK_DRIVER_MAX_RESULTS=''
> SPK_EXEC_CORES=''
> SPK_DRIVER_MEM_OVERHEAD=''
> SPK_EXEC_MEM_OVERHEAD=''
> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>
> LDA_OPTIMIZER=''
> LDA_ALPHA=''
> LDA_BETA=''
>
> Regards,
>
> Vikash
>
>
>
>
>

Re: [SPOT-ML] spot-ml fails to start

Posted by "Barona, Ricardo" <ri...@intel.com>.

Hi Vikash,

Yeah, you need Spark 2.1.0. Documentation is getting updated as we speak.

As for user domain, it will take only one domain, like ‘intel’ or ‘mycompany’ but not a list (for now). If you don’t have any domain don’t worry, that particular parameter can be empty.

From: Vikash Kumar <vi...@oneconvergence.com>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Monday, July 24, 2017 at 3:47 AM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Re: [SPOT-ML] spot-ml fails to start

Thanks for quick reply. I have made the changes as suggested but getting an exception here:

+ spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory 1g --conf spark.driver.maxResultSize=1g --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=3 --conf spark.executor.cores=1 --conf spark.executor.memory=3g --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047 --conf spark.yarn.executor.memoryOverhead=3047 target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002 --ldabeta 1.0001 --ldaoptimizer em --precision 64 --userdomain compute-3
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
    at scopt.OptionParser.parse(options.scala:370)
    at org.apache.spot.SuspiciousConnects$.main(SuspiciousConnects.scala:52)
    at org.apache.spot.SuspiciousConnects.main(SuspiciousConnects.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Also, I didn't have userdomain set for my machine, do I need to set it ? I tried setting - '', <my compute hostname>, 'intel' in /etc/spot.conf . But every time, the exception was same.
Do I need to install some other software (version) ?

Regards,
Vikash

"Without requirements or design, programming is the art of adding bugs to an empty text file."- Louis Srygley

On Sat, Jul 22, 2017 at 12:35 AM, Barona, Ricardo <ri...@intel.com>> wrote:
I’m not sure if yarn is going to run in a single node but you can try taking 9 GB out of the total 16GB.
Divide 9GB by 3, that’s going to be your memory per executor. You’re also going to have only 3 executors total.
Try using 2 cores per executor if that’ supported by your node (total number of executors) or assign only 1 core per executor.

USER_DOMAIN='<your domain>'

SPK_EXEC='3'
SPK_EXEC_MEM='3g'
SPK_DRIVER_MEM='1g'
SPK_DRIVER_MAX_RESULTS='1g'
SPK_EXEC_CORES='1'
SPK_DRIVER_MEM_OVERHEAD='102'
SPK_EXEC_MEM_OVERHEAD='307'
SPK_AUTO_BRDCST_JOIN_THR='10485760'

As per LDA use the values I sent you on slack:

LDA_OPTIMIZER='em'
LDA_ALPHA='1.002'
LDA_BETA='1.0001'

Let me know if that works.

From: Nathanael Smith <na...@apache.org>>
Reply-To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Date: Friday, July 21, 2017 at 1:56 PM
To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Subject: Re: [SPOT-ML] spot-ml fails to start

Hi Vikash, and welcome!

To answer your last question first, it’s important to know that without filling out these fields you will get the error that you are seeing.
In the error message you will see that each configuration option is missing it’s value that should be provided by the spot.conf.

to provide some values you can follow this guide:
https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md

You will need to take into account not only the 16gb of memory, but how many cores are available to your single node and work it out from there.
I hope this helps get you started,

- Nathanael

On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vi...@oneconvergence.com>> wrote:

Hi all,
    I am a beginner and trying to bring spot on single node. I have used CDH 5.12 for installation.
    While trying to start spot-ml, I am getting below this error:

 spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory --conf spark.driver.maxResultSize= --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores= --conf spark.executor.memory= --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.ma<http://spark.kryoserializer.buffer.ma/>x=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha --ldabeta --ldaoptimizer --precision 64
Invalid initial heap size: -Xms--conf
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
I have machine with 16 GB memory. Is it possible to run spot in single machine ?
What is the recommended values of below config ?

USER_DOMAIN=''

SPK_EXEC=''
SPK_EXEC_MEM=''
SPK_DRIVER_MEM=''
SPK_DRIVER_MAX_RESULTS=''
SPK_EXEC_CORES=''
SPK_DRIVER_MEM_OVERHEAD=''
SPK_EXEC_MEM_OVERHEAD=''
SPK_AUTO_BRDCST_JOIN_THR='10485760'

LDA_OPTIMIZER=''
LDA_ALPHA=''
LDA_BETA=''
Regards,
Vikash

Re: [SPOT-ML] spot-ml fails to start

Posted by Vikash Kumar <vi...@oneconvergence.com>.

Thanks for quick reply. I have made the changes as suggested but getting an
exception here:

+ spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
--driver-memory 1g --conf spark.driver.maxResultSize=1g --conf
spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.maxExecutors=3 --conf spark.executor.cores=1
--conf spark.executor.memory=3g --conf
spark.sql.autoBroadcastJoinThreshold=10485760 --conf
'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M'
--conf spark.kryoserializer.buffer.max=512m --conf
spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=3047 --conf
spark.yarn.executor.memoryOverhead=3047
target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input
/user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback
/home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20
--scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold
1e-20 --maxresults 200 --ldamaxiterations 20 --ldaalpha 1.002 --ldabeta
1.0001 --ldaoptimizer em --precision 64 --userdomain compute-3
Exception in thread "main" java.lang.NoSuchMethodError:
scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
    at scopt.OptionParser.parse(options.scala:370)
    at org.apache.spot.SuspiciousConnects$.main(SuspiciousConnects.scala:52)
    at org.apache.spot.SuspiciousConnects.main(SuspiciousConnects.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
    at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



Also, I didn't have userdomain set for my machine, do I need to set it ? I
tried setting - '', <my compute hostname>, 'intel' in /etc/spot.conf . But
every time, the exception was same.

Do I need to install some other software (version) ?

Regards,
Vikash

*"Without requirements or design, programming is the art of adding bugs to
an empty text file."- Louis Srygley*

On Sat, Jul 22, 2017 at 12:35 AM, Barona, Ricardo <ri...@intel.com>
wrote:

> I’m not sure if yarn is going to run in a single node but you can try
> taking 9 GB out of the total 16GB.
>
> Divide 9GB by 3, that’s going to be your memory per executor. You’re also
> going to have only 3 executors total.
>
> Try using 2 cores per executor if that’ supported by your node (total
> number of executors) or assign only 1 core per executor.
>
>
>
> USER_DOMAIN='<your domain>'
>
> SPK_EXEC='3'
> SPK_EXEC_MEM='3g'
> SPK_DRIVER_MEM='1g'
> SPK_DRIVER_MAX_RESULTS='1g'
> SPK_EXEC_CORES='1'
> SPK_DRIVER_MEM_OVERHEAD='102'
> SPK_EXEC_MEM_OVERHEAD='307'
> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>
>
>
> As per LDA use the values I sent you on slack:
>
>
>
> LDA_OPTIMIZER='em'
> LDA_ALPHA='1.002'
> LDA_BETA='1.0001'
>
>
>
> Let me know if that works.
>
>
>
> *From: *Nathanael Smith <na...@apache.org>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Friday, July 21, 2017 at 1:56 PM
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Re: [SPOT-ML] spot-ml fails to start
>
>
>
> Hi Vikash, and welcome!
>
>
>
> To answer your last question first, it’s important to know that without
> filling out these fields you will get the error that you are seeing.
>
> In the error message you will see that each configuration option is
> missing it’s value that should be provided by the spot.conf.
>
>
>
> to provide some values you can follow this guide:
>
> https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md
>
>
>
> You will need to take into account not only the 16gb of memory, but how
> many cores are available to your single node and work it out from there.
>
> I hope this helps get you started,
>
>
>
> - Nathanael
>
>
>
>
>
>
>
> On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vikash.kumar@oneconvergence.
> com> wrote:
>
>
>
> Hi all,
>
>     I am a beginner and trying to bring spot on single node. I have used
> CDH 5.12 for installation.
>
>     While trying to start spot-ml, I am getting below this error:
>
>  spark-submit --class org.apache.spot.SuspiciousConnects --master yarn
> --driver-memory --conf spark.driver.maxResultSize= --conf
> spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true
> --conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores=
> --conf spark.executor.memory= --conf spark.sql.autoBroadcastJoinThreshold=10485760
> --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M
> -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf
> spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf
> spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar
> --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/
> --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv
> --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores
> --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha
> --ldabeta --ldaoptimizer --precision 64
> Invalid initial heap size: -Xms--conf
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
>
> I have machine with 16 GB memory. Is it possible to run spot in single
> machine ?
>
> What is the recommended values of below config ?
>
> USER_DOMAIN=''
>
> SPK_EXEC=''
> SPK_EXEC_MEM=''
> SPK_DRIVER_MEM=''
> SPK_DRIVER_MAX_RESULTS=''
> SPK_EXEC_CORES=''
> SPK_DRIVER_MEM_OVERHEAD=''
> SPK_EXEC_MEM_OVERHEAD=''
> SPK_AUTO_BRDCST_JOIN_THR='10485760'
>
> LDA_OPTIMIZER=''
> LDA_ALPHA=''
> LDA_BETA=''
>
> Regards,
>
> Vikash
>
>
>

Re: [SPOT-ML] spot-ml fails to start

Posted by "Barona, Ricardo" <ri...@intel.com>.

I’m not sure if yarn is going to run in a single node but you can try taking 9 GB out of the total 16GB.
Divide 9GB by 3, that’s going to be your memory per executor. You’re also going to have only 3 executors total.
Try using 2 cores per executor if that’ supported by your node (total number of executors) or assign only 1 core per executor.

USER_DOMAIN='<your domain>'

SPK_EXEC='3'
SPK_EXEC_MEM='3g'
SPK_DRIVER_MEM='1g'
SPK_DRIVER_MAX_RESULTS='1g'
SPK_EXEC_CORES='1'
SPK_DRIVER_MEM_OVERHEAD='102'
SPK_EXEC_MEM_OVERHEAD='307'
SPK_AUTO_BRDCST_JOIN_THR='10485760'

As per LDA use the values I sent you on slack:

LDA_OPTIMIZER='em'
LDA_ALPHA='1.002'
LDA_BETA='1.0001'

Let me know if that works.

From: Nathanael Smith <na...@apache.org>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Friday, July 21, 2017 at 1:56 PM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Re: [SPOT-ML] spot-ml fails to start

Hi Vikash, and welcome!

To answer your last question first, it’s important to know that without filling out these fields you will get the error that you are seeing.
In the error message you will see that each configuration option is missing it’s value that should be provided by the spot.conf.

to provide some values you can follow this guide:
https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md

You will need to take into account not only the 16gb of memory, but how many cores are available to your single node and work it out from there.
I hope this helps get you started,

- Nathanael



On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vi...@oneconvergence.com>> wrote:

Hi all,
    I am a beginner and trying to bring spot on single node. I have used CDH 5.12 for installation.
    While trying to start spot-ml, I am getting below this error:

 spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory --conf spark.driver.maxResultSize= --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores= --conf spark.executor.memory= --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.ma<http://spark.kryoserializer.buffer.ma/>x=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha --ldabeta --ldaoptimizer --precision 64
Invalid initial heap size: -Xms--conf
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

I have machine with 16 GB memory. Is it possible to run spot in single machine ?
What is the recommended values of below config ?

USER_DOMAIN=''

SPK_EXEC=''
SPK_EXEC_MEM=''
SPK_DRIVER_MEM=''
SPK_DRIVER_MAX_RESULTS=''
SPK_EXEC_CORES=''
SPK_DRIVER_MEM_OVERHEAD=''
SPK_EXEC_MEM_OVERHEAD=''
SPK_AUTO_BRDCST_JOIN_THR='10485760'

LDA_OPTIMIZER=''
LDA_ALPHA=''
LDA_BETA=''

Regards,
Vikash

Re: [SPOT-ML] spot-ml fails to start

Posted by Nathanael Smith <na...@apache.org>.

Hi Vikash, and welcome!

To answer your last question first, it’s important to know that without filling out these fields you will get the error that you are seeing.
In the error message you will see that each configuration option is missing it’s value that should be provided by the spot.conf.

to provide some values you can follow this guide:
https://github.com/apache/incubator-spot/blob/master/spot-ml/SPARKCONF.md

You will need to take into account not only the 16gb of memory, but how many cores are available to your single node and work it out from there.
I hope this helps get you started,

- Nathanael



> On Jul 21, 2017, at 11:38 AM, Vikash Kumar <vi...@oneconvergence.com> wrote:
> 
> Hi all,
> 
>     I am a beginner and trying to bring spot on single node. I have used CDH 5.12 for installation. 
> 
>     While trying to start spot-ml, I am getting below this error:
> 
>  spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --driver-memory --conf spark.driver.maxResultSize= --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors= --conf spark.executor.cores= --conf spark.executor.memory= --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.ma <http://spark.kryoserializer.buffer.ma/>x=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead= --conf spark.yarn.executor.memoryOverhead= target/scala-2.11/spot-ml-assembly-1.1.jar --analysis flow --input /user/cloudera-scm/flow/hive/y=1973/m=12/d=31/ --dupfactor 1000 --feedback /home/cloudera-scm/ml/flow/19731231/flow_scores.csv --ldatopiccount 20 --scored /user/cloudera-scm/flow/scored_results/19731231/scores --threshold 1e-20 --maxresults 50 --ldamaxiterations 20 --ldaalpha --ldabeta --ldaoptimizer --precision 64
> Invalid initial heap size: -Xms--conf
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> 
> 
> I have machine with 16 GB memory. Is it possible to run spot in single machine ?
> 
> What is the recommended values of below config ?
> 
> USER_DOMAIN=''
> 
> SPK_EXEC=''
> SPK_EXEC_MEM=''
> SPK_DRIVER_MEM=''
> SPK_DRIVER_MAX_RESULTS=''
> SPK_EXEC_CORES=''
> SPK_DRIVER_MEM_OVERHEAD=''
> SPK_EXEC_MEM_OVERHEAD=''
> SPK_AUTO_BRDCST_JOIN_THR='10485760'
> 
> LDA_OPTIMIZER=''
> LDA_ALPHA=''
> LDA_BETA=''
> 
> 
> Regards,
> Vikash