You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@predictionio.apache.org by "Miller, Clifford" <cl...@phoenix-opsgroup.com> on 2018/05/27 23:02:47 UTC

PIO 0.12.1 with HDP Spark on YARN

*I've installed an HDP cluster with Hbase and Spark with YARN.  As part of
that installation I created some HDP (Ambari) managed clients.  I installed
PIO on one of these clients and configured PIO to use the HDP installed
Hadoop, HBase, and Spark.  When I run the command 'pio eventserver &', I
get the following error.*

####
/home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [: 2.2.6.2.14-5:
integer expression expected
/home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[: 2.2.6.2.14-5:
syntax error: invalid arithmetic operator (error token is ".2.6.2.14-5")
/home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[: 2.2.6.2.14-5:
syntax error: invalid arithmetic operator (error token is ".2.6.2.14-5")
You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/ which
does not meet the minimum version requirement of 1.3.0.
Aborting.

####

*If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE with
an empty file, I can then start the Eventserver, which gives me the
following message:*

###
/usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a known
problem with certain vendors (e.g. Cloudera). Please make sure you are
using at least 1.3.0.
[INFO] [Management$] Creating Event Server at 0.0.0.0:7070
[WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
[INFO] [HttpListener] Bound to /0.0.0.0:7070
[INFO] [EventServerActor] Bound received. EventServer is ready.
####

*I can then send events to the Eventserver.  After sending the events
listed in the SimilarProduct Recommender example I am unable to train.
Using the cluster.  If I use 'pio train' then it successfully trains
locally.  If I atttempt to use the command "pio train -- --master yarn"
then I get the following:*

#######
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
        at
org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
        at
org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
        at scala.Option.foreach(Option.scala:257)
        at
org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:817)
        at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:911)
        at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172)
        at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at
org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at
org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at
org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
        at
org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

########

*What is the correct way to get PIO to use the YARN based Spark for
training?*

*Thanks,*

*--Cliff.*

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by suyash kharade <su...@gmail.com>.

I installed PIO on one of hdp nodes.

On Wed, May 30, 2018 at 10:25 PM, Miller, Clifford <
clifford.miller@phoenix-opsgroup.com> wrote:

> Are you installing PIO on a client node created by HDP or something else?
>
>
>
> On Wed, May 30, 2018 at 2:25 PM, suyash kharade <su...@gmail.com>
> wrote:
>
>> I am using hdp 2.6.4
>>
>> On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
>> clifford.miller@phoenix-opsgroup.com> wrote:
>>
>>> That's the command that I'm using but it gives me the exception that I
>>> listed in the previous email.  I've installed a Spark standalone cluster
>>> and am using that for training for now but would like to use Spark on YARN
>>> eventually.
>>>
>>> Are you using HDP? If so, what version of HDP are you using?  I'm using
>>> *HDP-2.6.2.14.*
>>>
>>>
>>>
>>> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <
>>> suyash.kharade@gmail.com> wrote:
>>>
>>>> I use 'pio train -- --master yarn'
>>>> It works for me to train universal recommender
>>>>
>>>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>
>>>>> To add more details to this.  When I attempt to execute my training
>>>>> job using the command 'pio train -- --master yarn' I get the exception that
>>>>> I've included below.  Can anyone tell me how to correctly submit the
>>>>> training job or what setting I need to change to make this work.  I've made
>>>>> not custom code changes and am simply using PIO 0.12.1 with the
>>>>> SimilarProduct Recommender.
>>>>>
>>>>>
>>>>>
>>>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040
>>>>> }
>>>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>>>> request executors before the AM has registered!
>>>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:154)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:152)
>>>>>         at scala.collection.IndexedSeqOpt
>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>         at scala.collection.mutable.Array
>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>> l.scala:152)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.submitApplication(Client.scala:172)
>>>>>         at org.apache.spark.scheduler.clu
>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>> kend.scala:56)
>>>>>         at org.apache.spark.scheduler.Tas
>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>         at org.apache.spark.SparkContext.
>>>>> <init>(SparkContext.scala:509)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at sun.reflect.NativeMethodAccess
>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>> mit.scala:751)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>>
>>>>>> So updating the version in the RELEASE file to 2.1.1 fixed the
>>>>>> version detection problem but I'm still not able to submit Spark jobs
>>>>>> unless they are strictly local.  How are you submitting to the HDP Spark?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --Cliff.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>>>> suyash.kharade@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Miller,
>>>>>>>     I faced same issue.
>>>>>>>     It is giving error as release file has '-' in version
>>>>>>>     Insert simple version in release file something like 2.6.
>>>>>>>
>>>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>>>>
>>>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>>>> eventserver &', I get the following error.*
>>>>>>>>
>>>>>>>> ####
>>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>>> ".2.6.2.14-5")
>>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>>> ".2.6.2.14-5")
>>>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at
>>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ which does not meet the minimum version
>>>>>>>> requirement of 1.3.0.
>>>>>>>> Aborting.
>>>>>>>>
>>>>>>>> ####
>>>>>>>>
>>>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the
>>>>>>>> RELEASE with an empty file, I can then start the Eventserver, which gives
>>>>>>>> me the following message:*
>>>>>>>>
>>>>>>>> ###
>>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is
>>>>>>>> a known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>>>>> are using at least 1.3.0.
>>>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>>>> ####
>>>>>>>>
>>>>>>>> *I can then send events to the Eventserver.  After sending the
>>>>>>>> events listed in the SimilarProduct Recommender example I am unable to
>>>>>>>> train.  Using the cluster.  If I use 'pio train' then it successfully
>>>>>>>> trains locally.  If I atttempt to use the command "pio train -- --master
>>>>>>>> yarn" then I get the following:*
>>>>>>>>
>>>>>>>> #######
>>>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>> 1
>>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>>>         at scala.collection.mutable.Array
>>>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>>>> l.scala:152)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>>>         at org.apache.spark.scheduler.clu
>>>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>>>> kend.scala:56)
>>>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>>>         at org.apache.spark.SparkContext.
>>>>>>>> <init>(SparkContext.scala:509)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>>>         at org.apache.predictionio.workfl
>>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>> Method)
>>>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>>>> mit.scala:751)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>>>
>>>>>>>> ########
>>>>>>>>
>>>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>>>> training?*
>>>>>>>>
>>>>>>>> *Thanks,*
>>>>>>>>
>>>>>>>> *--Cliff.*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Suyash K
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Suyash K
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Suyash K
>>
>
>
>
>


-- 
Regards,
Suyash K

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by "Miller, Clifford" <cl...@phoenix-opsgroup.com>.

Are you installing PIO on a client node created by HDP or something else?



On Wed, May 30, 2018 at 2:25 PM, suyash kharade <su...@gmail.com>
wrote:

> I am using hdp 2.6.4
>
> On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
> clifford.miller@phoenix-opsgroup.com> wrote:
>
>> That's the command that I'm using but it gives me the exception that I
>> listed in the previous email.  I've installed a Spark standalone cluster
>> and am using that for training for now but would like to use Spark on YARN
>> eventually.
>>
>> Are you using HDP? If so, what version of HDP are you using?  I'm using
>> *HDP-2.6.2.14.*
>>
>>
>>
>> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <suyash.kharade@gmail.com
>> > wrote:
>>
>>> I use 'pio train -- --master yarn'
>>> It works for me to train universal recommender
>>>
>>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>
>>>> To add more details to this.  When I attempt to execute my training job
>>>> using the command 'pio train -- --master yarn' I get the exception that
>>>> I've included below.  Can anyone tell me how to correctly submit the
>>>> training job or what setting I need to change to make this work.  I've made
>>>> not custom code changes and am simply using PIO 0.12.1 with the
>>>> SimilarProduct Recommender.
>>>>
>>>>
>>>>
>>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>>> request executors before the AM has registered!
>>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>>> qOptimized.scala:33)
>>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>>> la:186)
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>>> 6.apply(Client.scala:819)
>>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>>> 6.apply(Client.scala:817)
>>>>         at scala.Option.foreach(Option.scala:257)
>>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>>> ala:817)
>>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>>> text(Client.scala:911)
>>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>>> .scala:172)
>>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>>> ulerImpl.scala:156)
>>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>>> flowContext.scala:45)
>>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>>> Workflow.scala:59)
>>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>>> Workflow.scala:251)
>>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>>> orkflow.scala)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:62)
>>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>>> .scala:187)
>>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>>> a:212)
>>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>>> 126)
>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>
>>>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>>>> detection problem but I'm still not able to submit Spark jobs unless they
>>>>> are strictly local.  How are you submitting to the HDP Spark?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --Cliff.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>>> suyash.kharade@gmail.com> wrote:
>>>>>
>>>>>> Hi Miller,
>>>>>>     I faced same issue.
>>>>>>     It is giving error as release file has '-' in version
>>>>>>     Insert simple version in release file something like 2.6.
>>>>>>
>>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>>>
>>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>>> eventserver &', I get the following error.*
>>>>>>>
>>>>>>> ####
>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>> ".2.6.2.14-5")
>>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>>> ".2.6.2.14-5")
>>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at
>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ which does not meet the minimum version
>>>>>>> requirement of 1.3.0.
>>>>>>> Aborting.
>>>>>>>
>>>>>>> ####
>>>>>>>
>>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the
>>>>>>> RELEASE with an empty file, I can then start the Eventserver, which gives
>>>>>>> me the following message:*
>>>>>>>
>>>>>>> ###
>>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is
>>>>>>> a known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>>>> are using at least 1.3.0.
>>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>>> ####
>>>>>>>
>>>>>>> *I can then send events to the Eventserver.  After sending the
>>>>>>> events listed in the SimilarProduct Recommender example I am unable to
>>>>>>> train.  Using the cluster.  If I use 'pio train' then it successfully
>>>>>>> trains locally.  If I atttempt to use the command "pio train -- --master
>>>>>>> yarn" then I get the following:*
>>>>>>>
>>>>>>> #######
>>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>>> 1
>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>>         at scala.collection.mutable.Array
>>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>>> l.scala:152)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>>         at org.apache.spark.scheduler.clu
>>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>>> kend.scala:56)
>>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>>         at org.apache.spark.SparkContext.
>>>>>>> <init>(SparkContext.scala:509)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>>         at org.apache.predictionio.workfl
>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>> Method)
>>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>>> mit.scala:751)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>>
>>>>>>> ########
>>>>>>>
>>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>>> training?*
>>>>>>>
>>>>>>> *Thanks,*
>>>>>>>
>>>>>>> *--Cliff.*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Suyash K
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Suyash K
>>>
>>
>>
>>
>>
>>
>
>
> --
> Regards,
> Suyash K
>

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by suyash kharade <su...@gmail.com>.

I am using hdp 2.6.4

On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
clifford.miller@phoenix-opsgroup.com> wrote:

> That's the command that I'm using but it gives me the exception that I
> listed in the previous email.  I've installed a Spark standalone cluster
> and am using that for training for now but would like to use Spark on YARN
> eventually.
>
> Are you using HDP? If so, what version of HDP are you using?  I'm using
> *HDP-2.6.2.14.*
>
>
>
> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <su...@gmail.com>
> wrote:
>
>> I use 'pio train -- --master yarn'
>> It works for me to train universal recommender
>>
>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>> clifford.miller@phoenix-opsgroup.com> wrote:
>>
>>> To add more details to this.  When I attempt to execute my training job
>>> using the command 'pio train -- --master yarn' I get the exception that
>>> I've included below.  Can anyone tell me how to correctly submit the
>>> training job or what setting I need to change to make this work.  I've made
>>> not custom code changes and am simply using PIO 0.12.1 with the
>>> SimilarProduct Recommender.
>>>
>>>
>>>
>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>> request executors before the AM has registered!
>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>> qOptimized.scala:33)
>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>> la:186)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:819)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:817)
>>>         at scala.Option.foreach(Option.scala:257)
>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>> ala:817)
>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>> text(Client.scala:911)
>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>> .scala:172)
>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>> ulerImpl.scala:156)
>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>> flowContext.scala:45)
>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>> Workflow.scala:59)
>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>> Workflow.scala:251)
>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>> orkflow.scala)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>> .scala:187)
>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>> a:212)
>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>> 126)
>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>>
>>>
>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>
>>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>>> detection problem but I'm still not able to submit Spark jobs unless they
>>>> are strictly local.  How are you submitting to the HDP Spark?
>>>>
>>>> Thanks,
>>>>
>>>> --Cliff.
>>>>
>>>>
>>>>
>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>> suyash.kharade@gmail.com> wrote:
>>>>
>>>>> Hi Miller,
>>>>>     I faced same issue.
>>>>>     It is giving error as release file has '-' in version
>>>>>     Insert simple version in release file something like 2.6.
>>>>>
>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>>
>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>> eventserver &', I get the following error.*
>>>>>>
>>>>>> ####
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>> ".2.6.2.14-5")
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>> ".2.6.2.14-5")
>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>>>> which does not meet the minimum version requirement of 1.3.0.
>>>>>> Aborting.
>>>>>>
>>>>>> ####
>>>>>>
>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>>>> following message:*
>>>>>>
>>>>>> ###
>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>>> are using at least 1.3.0.
>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>> ####
>>>>>>
>>>>>> *I can then send events to the Eventserver.  After sending the events
>>>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>>>> then I get the following:*
>>>>>>
>>>>>> #######
>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>> 1
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>         at scala.collection.mutable.Array
>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>> l.scala:152)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>         at org.apache.spark.scheduler.clu
>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>> kend.scala:56)
>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>         at org.apache.spark.SparkContext.
>>>>>> <init>(SparkContext.scala:509)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>> Method)
>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>> mit.scala:751)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>
>>>>>> ########
>>>>>>
>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>> training?*
>>>>>>
>>>>>> *Thanks,*
>>>>>>
>>>>>> *--Cliff.*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Suyash K
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> --
>> Regards,
>> Suyash K
>>
>
>
>
>
>


-- 
Regards,
Suyash K

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by suyash kharade <su...@gmail.com>.

On Wed, May 30, 2018 at 7:15 AM, Miller, Clifford <
clifford.miller@phoenix-opsgroup.com> wrote:

> That's the command that I'm using but it gives me the exception that I
> listed in the previous email.  I've installed a Spark standalone cluster
> and am using that for training for now but would like to use Spark on YARN
> eventually.
>
> Are you using HDP? If so, what version of HDP are you using?  I'm using
> *HDP-2.6.2.14.*
>
>
>
> On Tue, May 29, 2018 at 8:55 PM, suyash kharade <su...@gmail.com>
> wrote:
>
>> I use 'pio train -- --master yarn'
>> It works for me to train universal recommender
>>
>> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
>> clifford.miller@phoenix-opsgroup.com> wrote:
>>
>>> To add more details to this.  When I attempt to execute my training job
>>> using the command 'pio train -- --master yarn' I get the exception that
>>> I've included below.  Can anyone tell me how to correctly submit the
>>> training job or what setting I need to change to make this work.  I've made
>>> not custom code changes and am simply using PIO 0.12.1 with the
>>> SimilarProduct Recommender.
>>>
>>>
>>>
>>> [ERROR] [SparkContext] Error initializing SparkContext.
>>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
>>> request executors before the AM has registered!
>>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>> qOptimized.scala:33)
>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>> la:186)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:819)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:817)
>>>         at scala.Option.foreach(Option.scala:257)
>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>> ala:817)
>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>> text(Client.scala:911)
>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>> .scala:172)
>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>> ulerImpl.scala:156)
>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>> flowContext.scala:45)
>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>> Workflow.scala:59)
>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>> Workflow.scala:251)
>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>> orkflow.scala)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>> .scala:187)
>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>> a:212)
>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>> 126)
>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>>
>>>
>>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>
>>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>>> detection problem but I'm still not able to submit Spark jobs unless they
>>>> are strictly local.  How are you submitting to the HDP Spark?
>>>>
>>>> Thanks,
>>>>
>>>> --Cliff.
>>>>
>>>>
>>>>
>>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>>> suyash.kharade@gmail.com> wrote:
>>>>
>>>>> Hi Miller,
>>>>>     I faced same issue.
>>>>>     It is giving error as release file has '-' in version
>>>>>     Insert simple version in release file something like 2.6.
>>>>>
>>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>>
>>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>>> eventserver &', I get the following error.*
>>>>>>
>>>>>> ####
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>>> 2.2.6.2.14-5: integer expression expected
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>> ".2.6.2.14-5")
>>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>>> ".2.6.2.14-5")
>>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>>>> which does not meet the minimum version requirement of 1.3.0.
>>>>>> Aborting.
>>>>>>
>>>>>> ####
>>>>>>
>>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>>>> following message:*
>>>>>>
>>>>>> ###
>>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>>> are using at least 1.3.0.
>>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>>> cannot be used because libhadoop cannot be loaded.
>>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>>> ####
>>>>>>
>>>>>> *I can then send events to the Eventserver.  After sending the events
>>>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>>>> then I get the following:*
>>>>>>
>>>>>> #######
>>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
>>>>>> 1
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>> rnSparkHadoopUtil.scala:154)
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>>> rnSparkHadoopUtil.scala:152)
>>>>>>         at scala.collection.IndexedSeqOpt
>>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>         at scala.collection.mutable.Array
>>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>>> l.scala:152)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>>         at org.apache.spark.deploy.yarn.C
>>>>>> lient.submitApplication(Client.scala:172)
>>>>>>         at org.apache.spark.scheduler.clu
>>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>>> kend.scala:56)
>>>>>>         at org.apache.spark.scheduler.Tas
>>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>>         at org.apache.spark.SparkContext.
>>>>>> <init>(SparkContext.scala:509)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>>         at org.apache.predictionio.workfl
>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>> Method)
>>>>>>         at sun.reflect.NativeMethodAccess
>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>         at sun.reflect.DelegatingMethodAc
>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>> mit.scala:751)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>         at org.apache.spark.deploy.SparkS
>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>
>>>>>> ########
>>>>>>
>>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>>> training?*
>>>>>>
>>>>>> *Thanks,*
>>>>>>
>>>>>> *--Cliff.*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Suyash K
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> --
>> Regards,
>> Suyash K
>>
>
>
>
>
>


-- 
Regards,
Suyash K

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by Pat Ferrel <pa...@actionml.com>.

Yarn has to be started explicitly. Usually it is part of Hadoop and is
started with Hadoop. Spark only contains the client for Yarn (afaik).



From: Miller, Clifford <cl...@phoenix-opsgroup.com>
<cl...@phoenix-opsgroup.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 29, 2018 at 6:45:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Re: PIO 0.12.1 with HDP Spark on YARN

That's the command that I'm using but it gives me the exception that I
listed in the previous email.  I've installed a Spark standalone cluster
and am using that for training for now but would like to use Spark on YARN
eventually.

Are you using HDP? If so, what version of HDP are you using?  I'm using
*HDP-2.6.2.14.*



On Tue, May 29, 2018 at 8:55 PM, suyash kharade <su...@gmail.com>
wrote:

> I use 'pio train -- --master yarn'
> It works for me to train universal recommender
>
> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
> clifford.miller@phoenix-opsgroup.com> wrote:
>
>> To add more details to this.  When I attempt to execute my training job
>> using the command 'pio train -- --master yarn' I get the exception that
>> I've included below.  Can anyone tell me how to correctly submit the
>> training job or what setting I need to change to make this work.  I've made
>> not custom code changes and am simply using PIO 0.12.1 with the
>> SimilarProduct Recommender.
>>
>>
>>
>> [ERROR] [SparkContext] Error initializing SparkContext.
>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request
>> executors before the AM has registered!
>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>         at scala.collection.IndexedSeqOptimized$class.foreach(
>> IndexedSeqOptimized.scala:33)
>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.
>> scala:186)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>> InputString(YarnSparkHadoopUtil.scala:152)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:819)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:817)
>>         at scala.Option.foreach(Option.scala:257)
>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>> ala:817)
>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>> text(Client.scala:911)
>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>> .scala:172)
>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>> d.start(YarnClientSchedulerBackend.scala:56)
>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>> ulerImpl.scala:156)
>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(
>> WorkflowContext.scala:45)
>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(
>> CoreWorkflow.scala:59)
>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>> Workflow.scala:251)
>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>> orkflow.scala)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:187)
>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>> scala:212)
>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 126)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>>
>>
>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>> clifford.miller@phoenix-opsgroup.com> wrote:
>>
>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>> detection problem but I'm still not able to submit Spark jobs unless they
>>> are strictly local.  How are you submitting to the HDP Spark?
>>>
>>> Thanks,
>>>
>>> --Cliff.
>>>
>>>
>>>
>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>> suyash.kharade@gmail.com> wrote:
>>>
>>>> Hi Miller,
>>>>     I faced same issue.
>>>>     It is giving error as release file has '-' in version
>>>>     Insert simple version in release file something like 2.6.
>>>>
>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>
>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>> eventserver &', I get the following error.*
>>>>>
>>>>> ####
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>> 2.2.6.2.14-5: integer expression expected
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>> ".2.6.2.14-5")
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>> ".2.6.2.14-5")
>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>>> which does not meet the minimum version requirement of 1.3.0.
>>>>> Aborting.
>>>>>
>>>>> ####
>>>>>
>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>>> following message:*
>>>>>
>>>>> ###
>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>> are using at least 1.3.0.
>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>> cannot be used because libhadoop cannot be loaded.
>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>> ####
>>>>>
>>>>> *I can then send events to the Eventserver.  After sending the events
>>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>>> then I get the following:*
>>>>>
>>>>> #######
>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:154)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:152)
>>>>>         at scala.collection.IndexedSeqOpt
>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>         at scala.collection.mutable.Array
>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>> l.scala:152)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.submitApplication(Client.scala:172)
>>>>>         at org.apache.spark.scheduler.clu
>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>> kend.scala:56)
>>>>>         at org.apache.spark.scheduler.Tas
>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>         at org.apache.spark.SparkContext.
>>>>> <init>(SparkContext.scala:509)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at sun.reflect.NativeMethodAccess
>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>> mit.scala:751)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>
>>>>> ########
>>>>>
>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>> training?*
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>> *--Cliff.*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Suyash K
>>>>
>>>
>>>
>>>
>>>
>>>
>
>
> --
> Regards,
> Suyash K
>

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by "Miller, Clifford" <cl...@phoenix-opsgroup.com>.

That's the command that I'm using but it gives me the exception that I
listed in the previous email.  I've installed a Spark standalone cluster
and am using that for training for now but would like to use Spark on YARN
eventually.

Are you using HDP? If so, what version of HDP are you using?  I'm using
*HDP-2.6.2.14.*



On Tue, May 29, 2018 at 8:55 PM, suyash kharade <su...@gmail.com>
wrote:

> I use 'pio train -- --master yarn'
> It works for me to train universal recommender
>
> On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
> clifford.miller@phoenix-opsgroup.com> wrote:
>
>> To add more details to this.  When I attempt to execute my training job
>> using the command 'pio train -- --master yarn' I get the exception that
>> I've included below.  Can anyone tell me how to correctly submit the
>> training job or what setting I need to change to make this work.  I've made
>> not custom code changes and am simply using PIO 0.12.1 with the
>> SimilarProduct Recommender.
>>
>>
>>
>> [ERROR] [SparkContext] Error initializing SparkContext.
>> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
>> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request
>> executors before the AM has registered!
>> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>         at scala.collection.IndexedSeqOptimized$class.foreach(
>> IndexedSeqOptimized.scala:33)
>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.
>> scala:186)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>> InputString(YarnSparkHadoopUtil.scala:152)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:819)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:817)
>>         at scala.Option.foreach(Option.scala:257)
>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>> ala:817)
>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>> text(Client.scala:911)
>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>> .scala:172)
>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>> d.start(YarnClientSchedulerBackend.scala:56)
>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>> ulerImpl.scala:156)
>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(
>> WorkflowContext.scala:45)
>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(
>> CoreWorkflow.scala:59)
>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>> Workflow.scala:251)
>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>> orkflow.scala)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:187)
>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>> scala:212)
>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 126)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>>
>>
>> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
>> clifford.miller@phoenix-opsgroup.com> wrote:
>>
>>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>>> detection problem but I'm still not able to submit Spark jobs unless they
>>> are strictly local.  How are you submitting to the HDP Spark?
>>>
>>> Thanks,
>>>
>>> --Cliff.
>>>
>>>
>>>
>>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <
>>> suyash.kharade@gmail.com> wrote:
>>>
>>>> Hi Miller,
>>>>     I faced same issue.
>>>>     It is giving error as release file has '-' in version
>>>>     Insert simple version in release file something like 2.6.
>>>>
>>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>>
>>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As
>>>>> part of that installation I created some HDP (Ambari) managed clients.  I
>>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>>> eventserver &', I get the following error.*
>>>>>
>>>>> ####
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>>> 2.2.6.2.14-5: integer expression expected
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>> ".2.6.2.14-5")
>>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>>> ".2.6.2.14-5")
>>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>>> which does not meet the minimum version requirement of 1.3.0.
>>>>> Aborting.
>>>>>
>>>>> ####
>>>>>
>>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>>> following message:*
>>>>>
>>>>> ###
>>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>>> are using at least 1.3.0.
>>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>>> cannot be used because libhadoop cannot be loaded.
>>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>>> ####
>>>>>
>>>>> *I can then send events to the Eventserver.  After sending the events
>>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>>> then I get the following:*
>>>>>
>>>>> #######
>>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:154)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(Ya
>>>>> rnSparkHadoopUtil.scala:152)
>>>>>         at scala.collection.IndexedSeqOpt
>>>>> imized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>         at scala.collection.mutable.Array
>>>>> Ops$ofRef.foreach(ArrayOps.scala:186)
>>>>>         at org.apache.spark.deploy.yarn.Y
>>>>> arnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUti
>>>>> l.scala:152)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
>>>>>         at scala.Option.foreach(Option.scala:257)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.setupLaunchEnv(Client.scala:817)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.createContainerLaunchContext(Client.scala:911)
>>>>>         at org.apache.spark.deploy.yarn.C
>>>>> lient.submitApplication(Client.scala:172)
>>>>>         at org.apache.spark.scheduler.clu
>>>>> ster.YarnClientSchedulerBackend.start(YarnClientSchedulerBac
>>>>> kend.scala:56)
>>>>>         at org.apache.spark.scheduler.Tas
>>>>> kSchedulerImpl.start(TaskSchedulerImpl.scala:156)
>>>>>         at org.apache.spark.SparkContext.
>>>>> <init>(SparkContext.scala:509)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.WorkflowContext$.apply(WorkflowContext.scala:45)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:251)
>>>>>         at org.apache.predictionio.workfl
>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at sun.reflect.NativeMethodAccess
>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>> mit.scala:751)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>         at org.apache.spark.deploy.SparkS
>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>
>>>>> ########
>>>>>
>>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>>> training?*
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>> *--Cliff.*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Suyash K
>>>>
>>>
>>>
>>>
>>>
>>>
>
>
> --
> Regards,
> Suyash K
>

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by suyash kharade <su...@gmail.com>.

I use 'pio train -- --master yarn'
It works for me to train universal recommender

On Tue, May 29, 2018 at 8:31 PM, Miller, Clifford <
clifford.miller@phoenix-opsgroup.com> wrote:

> To add more details to this.  When I attempt to execute my training job
> using the command 'pio train -- --master yarn' I get the exception that
> I've included below.  Can anyone tell me how to correctly submit the
> training job or what setting I need to change to make this work.  I've made
> not custom code changes and am simply using PIO 0.12.1 with the
> SimilarProduct Recommender.
>
>
>
> [ERROR] [SparkContext] Error initializing SparkContext.
> [INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request
> executors before the AM has registered!
> [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$
> setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$
> setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>         at scala.collection.IndexedSeqOptimized$class.
> foreach(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.ArrayOps$ofRef.foreach(
> ArrayOps.scala:186)
>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.
> setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
>         at org.apache.spark.deploy.yarn.Client$$anonfun$
> setupLaunchEnv$6.apply(Client.scala:819)
>         at org.apache.spark.deploy.yarn.Client$$anonfun$
> setupLaunchEnv$6.apply(Client.scala:817)
>         at scala.Option.foreach(Option.scala:257)
>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.
> scala:817)
>         at org.apache.spark.deploy.yarn.Client.
> createContainerLaunchContext(Client.scala:911)
>         at org.apache.spark.deploy.yarn.Client.submitApplication(
> Client.scala:172)
>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.
> start(YarnClientSchedulerBackend.scala:56)
>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(
> TaskSchedulerImpl.scala:156)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>         at org.apache.predictionio.workflow.WorkflowContext$.
> apply(WorkflowContext.scala:45)
>         at org.apache.predictionio.workflow.CoreWorkflow$.
> runTrain(CoreWorkflow.scala:59)
>         at org.apache.predictionio.workflow.CreateWorkflow$.main(
> CreateWorkflow.scala:251)
>         at org.apache.predictionio.workflow.CreateWorkflow.main(
> CreateWorkflow.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:187)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:212)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:126)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
>
> On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
> clifford.miller@phoenix-opsgroup.com> wrote:
>
>> So updating the version in the RELEASE file to 2.1.1 fixed the version
>> detection problem but I'm still not able to submit Spark jobs unless they
>> are strictly local.  How are you submitting to the HDP Spark?
>>
>> Thanks,
>>
>> --Cliff.
>>
>>
>>
>> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <suyash.kharade@gmail.com
>> > wrote:
>>
>>> Hi Miller,
>>>     I faced same issue.
>>>     It is giving error as release file has '-' in version
>>>     Insert simple version in release file something like 2.6.
>>>
>>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>>> clifford.miller@phoenix-opsgroup.com> wrote:
>>>
>>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As part
>>>> of that installation I created some HDP (Ambari) managed clients.  I
>>>> installed PIO on one of these clients and configured PIO to use the HDP
>>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>>> eventserver &', I get the following error.*
>>>>
>>>> ####
>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>>> 2.2.6.2.14-5: integer expression expected
>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>> ".2.6.2.14-5")
>>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>>> ".2.6.2.14-5")
>>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>>> which does not meet the minimum version requirement of 1.3.0.
>>>> Aborting.
>>>>
>>>> ####
>>>>
>>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>>> with an empty file, I can then start the Eventserver, which gives me the
>>>> following message:*
>>>>
>>>> ###
>>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>>> are using at least 1.3.0.
>>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>>> cannot be used because libhadoop cannot be loaded.
>>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>>> ####
>>>>
>>>> *I can then send events to the Eventserver.  After sending the events
>>>> listed in the SimilarProduct Recommender example I am unable to train.
>>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>>> then I get the following:*
>>>>
>>>> #######
>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>>> qOptimized.scala:33)
>>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>>> la:186)
>>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>>> 6.apply(Client.scala:819)
>>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>>> 6.apply(Client.scala:817)
>>>>         at scala.Option.foreach(Option.scala:257)
>>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>>> ala:817)
>>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>>> text(Client.scala:911)
>>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>>> .scala:172)
>>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>>> ulerImpl.scala:156)
>>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>>> flowContext.scala:45)
>>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>>> Workflow.scala:59)
>>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>>> Workflow.scala:251)
>>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>>> orkflow.scala)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:62)
>>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>>> .scala:187)
>>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>>> a:212)
>>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>>> 126)
>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>
>>>> ########
>>>>
>>>> *What is the correct way to get PIO to use the YARN based Spark for
>>>> training?*
>>>>
>>>> *Thanks,*
>>>>
>>>> *--Cliff.*
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Suyash K
>>>
>>
>>
>>
>>
>>


-- 
Regards,
Suyash K

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by "Miller, Clifford" <cl...@phoenix-opsgroup.com>.

To add more details to this.  When I attempt to execute my training job
using the command 'pio train -- --master yarn' I get the exception that
I've included below.  Can anyone tell me how to correctly submit the
training job or what setting I need to change to make this work.  I've made
not custom code changes and am simply using PIO 0.12.1 with the
SimilarProduct Recommender.



[ERROR] [SparkContext] Error initializing SparkContext.
[INFO] [ServerConnector] Stopped Spark@1f992a3a{HTTP/1.1}{0.0.0.0:4040}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request
executors before the AM has registered!
[WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
        at
org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:819)
        at
org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:817)
        at scala.Option.foreach(Option.scala:257)
        at
org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:817)
        at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:911)
        at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172)
        at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at
org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at
org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at
org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
        at
org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




On Tue, May 29, 2018 at 12:01 AM, Miller, Clifford <
clifford.miller@phoenix-opsgroup.com> wrote:

> So updating the version in the RELEASE file to 2.1.1 fixed the version
> detection problem but I'm still not able to submit Spark jobs unless they
> are strictly local.  How are you submitting to the HDP Spark?
>
> Thanks,
>
> --Cliff.
>
>
>
> On Mon, May 28, 2018 at 1:12 AM, suyash kharade <su...@gmail.com>
> wrote:
>
>> Hi Miller,
>>     I faced same issue.
>>     It is giving error as release file has '-' in version
>>     Insert simple version in release file something like 2.6.
>>
>> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
>> clifford.miller@phoenix-opsgroup.com> wrote:
>>
>>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As part
>>> of that installation I created some HDP (Ambari) managed clients.  I
>>> installed PIO on one of these clients and configured PIO to use the HDP
>>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>>> eventserver &', I get the following error.*
>>>
>>> ####
>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>>> 2.2.6.2.14-5: integer expression expected
>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>> ".2.6.2.14-5")
>>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>>> ".2.6.2.14-5")
>>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>>> which does not meet the minimum version requirement of 1.3.0.
>>> Aborting.
>>>
>>> ####
>>>
>>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>>> with an empty file, I can then start the Eventserver, which gives me the
>>> following message:*
>>>
>>> ###
>>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>>> are using at least 1.3.0.
>>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>>> [WARN] [DomainSocketFactory] The short-circuit local reads feature
>>> cannot be used because libhadoop cannot be loaded.
>>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>>> ####
>>>
>>> *I can then send events to the Eventserver.  After sending the events
>>> listed in the SimilarProduct Recommender example I am unable to train.
>>> Using the cluster.  If I use 'pio train' then it successfully trains
>>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>>> then I get the following:*
>>>
>>> #######
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>>         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
>>> qOptimized.scala:33)
>>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.sca
>>> la:186)
>>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>>> InputString(YarnSparkHadoopUtil.scala:152)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:819)
>>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>>> 6.apply(Client.scala:817)
>>>         at scala.Option.foreach(Option.scala:257)
>>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>>> ala:817)
>>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>>> text(Client.scala:911)
>>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>>> .scala:172)
>>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>>> d.start(YarnClientSchedulerBackend.scala:56)
>>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>>> ulerImpl.scala:156)
>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(Work
>>> flowContext.scala:45)
>>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>> Workflow.scala:59)
>>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>> Workflow.scala:251)
>>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>> orkflow.scala)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>> .scala:187)
>>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>> a:212)
>>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>>> 126)
>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>> ########
>>>
>>> *What is the correct way to get PIO to use the YARN based Spark for
>>> training?*
>>>
>>> *Thanks,*
>>>
>>> *--Cliff.*
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Suyash K
>>
>
>
>
>
>

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by "Miller, Clifford" <cl...@phoenix-opsgroup.com>.

So updating the version in the RELEASE file to 2.1.1 fixed the version
detection problem but I'm still not able to submit Spark jobs unless they
are strictly local.  How are you submitting to the HDP Spark?

Thanks,

--Cliff.



On Mon, May 28, 2018 at 1:12 AM, suyash kharade <su...@gmail.com>
wrote:

> Hi Miller,
>     I faced same issue.
>     It is giving error as release file has '-' in version
>     Insert simple version in release file something like 2.6.
>
> On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
> clifford.miller@phoenix-opsgroup.com> wrote:
>
>> *I've installed an HDP cluster with Hbase and Spark with YARN.  As part
>> of that installation I created some HDP (Ambari) managed clients.  I
>> installed PIO on one of these clients and configured PIO to use the HDP
>> installed Hadoop, HBase, and Spark.  When I run the command 'pio
>> eventserver &', I get the following error.*
>>
>> ####
>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [:
>> 2.2.6.2.14-5: integer expression expected
>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>> ".2.6.2.14-5")
>> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
>> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
>> ".2.6.2.14-5")
>> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
>> which does not meet the minimum version requirement of 1.3.0.
>> Aborting.
>>
>> ####
>>
>> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE
>> with an empty file, I can then start the Eventserver, which gives me the
>> following message:*
>>
>> ###
>> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
>> known problem with certain vendors (e.g. Cloudera). Please make sure you
>> are using at least 1.3.0.
>> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
>> [WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
>> be used because libhadoop cannot be loaded.
>> [INFO] [HttpListener] Bound to /0.0.0.0:7070
>> [INFO] [EventServerActor] Bound received. EventServer is ready.
>> ####
>>
>> *I can then send events to the Eventserver.  After sending the events
>> listed in the SimilarProduct Recommender example I am unable to train.
>> Using the cluster.  If I use 'pio train' then it successfully trains
>> locally.  If I atttempt to use the command "pio train -- --master yarn"
>> then I get the following:*
>>
>> #######
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$se
>> tEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>>         at scala.collection.IndexedSeqOptimized$class.foreach(
>> IndexedSeqOptimized.scala:33)
>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.
>> scala:186)
>>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFrom
>> InputString(YarnSparkHadoopUtil.scala:152)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:819)
>>         at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$
>> 6.apply(Client.scala:817)
>>         at scala.Option.foreach(Option.scala:257)
>>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.sc
>> ala:817)
>>         at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>> text(Client.scala:911)
>>         at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>> .scala:172)
>>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBacken
>> d.start(YarnClientSchedulerBackend.scala:56)
>>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSched
>> ulerImpl.scala:156)
>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>>         at org.apache.predictionio.workflow.WorkflowContext$.apply(
>> WorkflowContext.scala:45)
>>         at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(
>> CoreWorkflow.scala:59)
>>         at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>> Workflow.scala:251)
>>         at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>> orkflow.scala)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:751)
>>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:187)
>>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>> scala:212)
>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 126)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> ########
>>
>> *What is the correct way to get PIO to use the YARN based Spark for
>> training?*
>>
>> *Thanks,*
>>
>> *--Cliff.*
>>
>>
>>
>>
>
>
> --
> Regards,
> Suyash K
>



-- 
Clifford Miller
Mobile | 321.431.9089

Re: PIO 0.12.1 with HDP Spark on YARN

Posted by suyash kharade <su...@gmail.com>.

Hi Miller,
    I faced same issue.
    It is giving error as release file has '-' in version
    Insert simple version in release file something like 2.6.

On Mon, May 28, 2018 at 4:32 AM, Miller, Clifford <
clifford.miller@phoenix-opsgroup.com> wrote:

> *I've installed an HDP cluster with Hbase and Spark with YARN.  As part of
> that installation I created some HDP (Ambari) managed clients.  I installed
> PIO on one of these clients and configured PIO to use the HDP installed
> Hadoop, HBase, and Spark.  When I run the command 'pio eventserver &', I
> get the following error.*
>
> ####
> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 89: [: 2.2.6.2.14-5:
> integer expression expected
> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 93: [[:
> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
> ".2.6.2.14-5")
> /home/centos/PredictionIO-0.12.1/bin/semver.sh: line 97: [[:
> 2.2.6.2.14-5: syntax error: invalid arithmetic operator (error token is
> ".2.6.2.14-5")
> You have Apache Spark 2.1.1.2.6.2.14-5 at /usr/hdp/2.6.2.14-5/spark2/
> which does not meet the minimum version requirement of 1.3.0.
> Aborting.
>
> ####
>
> *If I then go to  /usr/hdp/2.6.2.14-5/spark2/ and replace the RELEASE with
> an empty file, I can then start the Eventserver, which gives me the
> following message:*
>
> ###
> /usr/hdp/2.6.2.14-5/spark2/ contains an empty RELEASE file. This is a
> known problem with certain vendors (e.g. Cloudera). Please make sure you
> are using at least 1.3.0.
> [INFO] [Management$] Creating Event Server at 0.0.0.0:7070
> [WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
> be used because libhadoop cannot be loaded.
> [INFO] [HttpListener] Bound to /0.0.0.0:7070
> [INFO] [EventServerActor] Bound received. EventServer is ready.
> ####
>
> *I can then send events to the Eventserver.  After sending the events
> listed in the SimilarProduct Recommender example I am unable to train.
> Using the cluster.  If I use 'pio train' then it successfully trains
> locally.  If I atttempt to use the command "pio train -- --master yarn"
> then I get the following:*
>
> #######
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$
> setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$
> setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
>         at scala.collection.IndexedSeqOptimized$class.
> foreach(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.ArrayOps$ofRef.foreach(
> ArrayOps.scala:186)
>         at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.
> setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
>         at org.apache.spark.deploy.yarn.Client$$anonfun$
> setupLaunchEnv$6.apply(Client.scala:819)
>         at org.apache.spark.deploy.yarn.Client$$anonfun$
> setupLaunchEnv$6.apply(Client.scala:817)
>         at scala.Option.foreach(Option.scala:257)
>         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.
> scala:817)
>         at org.apache.spark.deploy.yarn.Client.
> createContainerLaunchContext(Client.scala:911)
>         at org.apache.spark.deploy.yarn.Client.submitApplication(
> Client.scala:172)
>         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.
> start(YarnClientSchedulerBackend.scala:56)
>         at org.apache.spark.scheduler.TaskSchedulerImpl.start(
> TaskSchedulerImpl.scala:156)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
>         at org.apache.predictionio.workflow.WorkflowContext$.
> apply(WorkflowContext.scala:45)
>         at org.apache.predictionio.workflow.CoreWorkflow$.
> runTrain(CoreWorkflow.scala:59)
>         at org.apache.predictionio.workflow.CreateWorkflow$.main(
> CreateWorkflow.scala:251)
>         at org.apache.predictionio.workflow.CreateWorkflow.main(
> CreateWorkflow.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:187)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:212)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:126)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> ########
>
> *What is the correct way to get PIO to use the YARN based Spark for
> training?*
>
> *Thanks,*
>
> *--Cliff.*
>
>
>
>


-- 
Regards,
Suyash K