You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Wojciech Kowalski <wo...@tomandco.co.uk> on 2018/05/22 21:20:28 UTC

Problem with training in yarn cluster

Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
	at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
	at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
	at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
	at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
	at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
	at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
	at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
	at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
	at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
	at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn

Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?

Or any other advice ?

Thanks,
Wojciech

RE: Problem with training in yarn cluster

Posted by Wojciech Kowalski <wo...@tomandco.co.uk>.
Hello,

Actually I have another error in logs that is actually preventing train as well:

[INFO] [RecommendationEngine$] 

               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|


      
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception: 

Thanks,
Wojciech

From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster

Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn

Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?

Or any other advice ?

Thanks,
Wojciech


RE: Problem with training in yarn cluster

Posted by Pat Ferrel <pa...@actionml.com>.
Check the Spark GUI is has a good way to browse logs on and in workers.
Don’t know your data but 4g is pretty small. How many executors in your
cluster? Also why do you need yarn, it adds a fair bit of complexity.


From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Date: May 23, 2018 at 6:07:47 AM
To: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>, Ambuj Sharma
<am...@getamplify.com> <am...@getamplify.com>, user@predictionio.apache.org
<us...@predictionio.apache.org> <us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster

I just replaced name of the shop and space was added there. I don’t think
it’s anything to do with events names as normal training simply goes
through.



Hmm i can see that without using yarn it stopped working now as well :/



Some timeout but i have no idea where it can’t connect to, no info about
any host etc.



pio train -- --executor-memory 4g --driver-memory 4g  --verbose --master
local

[INFO] [Runner$] Submission command: /pio/vendors/spark/bin/spark-submit
--executor-memory 4g --driver-memory 4g --verbose --master local --class
org.apache.predictionio.workflow.CreateWorkflow --jars
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar,file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar,file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar
--files
file:/pio/conf/log4j.properties,file:/pio/vendors/hadoop/conf/core-site.xml,file:/pio/vendors/hbase/conf/hbase-site.xml
--driver-class-path
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf
--driver-java-options -Dpio.log.dir=/pio
file:/pio/lib/pio-assembly-0.12.1.jar --engine-id
com.actionml.RecommendationEngine --engine-version
59b11bc973a65a09bc1d1dee3db026e5c906e4f9 --engine-variant
file:/pio/engines/ob-ur-live/engine.json --verbosity 0 --json-extractor
Both --env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=pio-cluster-m,PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.12.1,PIO_FS_BASEDIR=/pio/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=pio-gc,PIO_HOME=/pio,PIO_FS_ENGINESDIR=/pio/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/pio/.pio_store/models,PIO_STORAGE_SOURCES_HBASE_PORTS=16020,PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs,PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://pio-cluster-m/pio/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/vendors/elasticsearch,PIO_FS_TMPDIR=/pio/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCSE_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

Using properties file: null

Parsed arguments:

  master                  local

  deployMode              null

  executorMemory          4g

  executorCores           null

  totalExecutorCores      null

  propertiesFile          null

  driverMemory            4g

  driverCores             null

  driverExtraClassPath
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf

  driverExtraLibraryPath  null

  driverExtraJavaOptions  -Dpio.log.dir=/pio

  supervise               false

  queue                   null

  numExecutors            null

  files
file:/pio/conf/log4j.properties,file:/pio/vendors/hadoop/conf/core-site.xml,file:/pio/vendors/hbase/conf/hbase-site.xml

  pyFiles                 null

  archives                null

  mainClass               org.apache.predictionio.workflow.CreateWorkflow

  primaryResource         file:/pio/lib/pio-assembly-0.12.1.jar

  name                    org.apache.predictionio.workflow.CreateWorkflow

  childArgs               [--engine-id com.actionml.RecommendationEngine
--engine-version 59b11bc973a65a09bc1d1dee3db026e5c906e4f9 --engine-variant
file:/pio/engines/ob-ur-live/engine.json --verbosity 0 --json-extractor
Both --env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=pio-cluster-m,PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.12.1,PIO_FS_BASEDIR=/pio/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=pio-gc,PIO_HOME=/pio,PIO_FS_ENGINESDIR=/pio/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/pio/.pio_store/models,PIO_STORAGE_SOURCES_HBASE_PORTS=16020,PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs,PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://pio-cluster-m/pio/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/vendors/elasticsearch,PIO_FS_TMPDIR=/pio/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCSE_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs]

  jars
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar,file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar,file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar

  packages                null

  packagesExclusions      null

  repositories            null

  verbose                 true



Spark properties used, including those specified through

--conf and those from the properties file null:

  spark.driver.memory -> 4g

  spark.driver.extraJavaOptions -> -Dpio.log.dir=/pio

  spark.driver.extraClassPath ->
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf





Main class:

org.apache.predictionio.workflow.CreateWorkflow

Arguments:

--engine-id

com.actionml.RecommendationEngine

--engine-version

59b11bc973a65a09bc1d1dee3db026e5c906e4f9

--engine-variant

file:/pio/engines/ob-ur-live/engine.json

--verbosity

0

--json-extractor

Both

--env

PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=pio-cluster-m,PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.12.1,PIO_FS_BASEDIR=/pio/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=pio-gc,PIO_HOME=/pio,PIO_FS_ENGINESDIR=/pio/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/pio/.pio_store/models,PIO_STORAGE_SOURCES_HBASE_PORTS=16020,PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs,PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://pio-cluster-m/pio/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/vendors/elasticsearch,PIO_FS_TMPDIR=/pio/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCSE_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

System properties:

spark.driver.memory -> 4g

SPARK_SUBMIT -> true

spark.files ->
file:/pio/conf/log4j.properties,file:/pio/vendors/hadoop/conf/core-site.xml,file:/pio/vendors/hbase/conf/hbase-site.xml

spark.app.name -> org.apache.predictionio.workflow.CreateWorkflow

spark.driver.extraJavaOptions -> -Dpio.log.dir=/pio

spark.jars ->
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar,file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar,file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar,file:/pio/lib/pio-assembly-0.12.1.jar

spark.submit.deployMode -> client

spark.master -> local

spark.driver.extraClassPath ->
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf

Classpath elements:

file:/pio/lib/pio-assembly-0.12.1.jar

file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar

file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar

file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar

file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar

file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar

file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar

file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar

file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar





[INFO] [RecommendationEngine$]



               _   _             __  __ _

     /\       | | (_)           |  \/  | |

    /  \   ___| |_ _  ___  _ __ | \  / | |

   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |

  / ____ \ (__| |_| | (_) | | | | |  | | |____

/_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|







[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be
used.

[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

Exception in thread "main" java.net.ConnectException

        at
org.apache.predictionio.shaded.org.apache.http.nio.pool.RouteSpecificPool.timeout(RouteSpecificPool.java:168)

        at
org.apache.predictionio.shaded.org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:561)

        at
org.apache.predictionio.shaded.org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:822)

        at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:183)

        at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:210)

        at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:155)

        at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)

        at
org.apache.predictionio.shaded.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)

        at
org.apache.predictionio.shaded.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)

        at java.lang.Thread.run(Thread.java:748)







Thanks,

Wojciech



*From: *Pat Ferrel <pa...@actionml.com>
*Sent: *23 May 2018 14:28
*To: *Ambuj Sharma <am...@getamplify.com>;
user@predictionio.apache.org; Wojciech
Kowalski <wo...@tomandco.co.uk>
*Subject: *RE: Problem with training in yarn cluster



I noticed the appName is different for DataSource (“shop _live”) and
Algorithm (“shop_live”). AppNames must match.



Also the eventNames are different, which should be ok but it’s still a
question. Why input something that is not used? Given the meaning of the
events, I’d use them all for recommendations but you may eventually want to
create shopping cart and wishlist models separately since this will yield
“complimentary purchases” and “things you may be missing” in the wishlist.




From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 23, 2018 at 5:17:06 AM
To: Ambuj Sharma <am...@getamplify.com> <am...@getamplify.com>,
user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster



Hello again,



After moving hbase to dataproc cluster from docker ( probs dns/hostname
resolution issues ) no more hbase error but still training stops:



[INFO] [RecommendationEngine$]



               _   _             __  __ _

     /\       | | (_)           |  \/  | |

    /  \   ___| |_ _  ___  _ __ | \  / | |

   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |

  / ____ \ (__| |_| | (_) | | | | |  | | |____

 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|







[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

[INFO] [Engine] Datasource params: (,DataSourceParams(shop
_live,List(purchase, basket-add, wishlist-add, view),None,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

[INFO] [log] Logging initialized @10046ms

[INFO] [Server] jetty-9.2.z-SNAPSHOT

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark}

[INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349}

[INFO] [Server] Started @10430ms

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark}

[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!

[INFO] [DataSource]

╔════════════════════════════════════════════════════════════╗

║ Init DataSource                                            ║

║ ══════════════════════════════════════════════════════════ ║

║ App name                      shop _live             ║

║ Event window                  None                         ║

║ Event names                   List(purchase, basket-add, wishlist-add, view) ║

║ Min events per user           None                         ║

╚════════════════════════════════════════════════════════════╝



[INFO] [URAlgorithm]

╔════════════════════════════════════════════════════════════╗

║ Init URAlgorithm                                           ║

║ ══════════════════════════════════════════════════════════ ║

║ App name                      shop_live             ║

║ ES index name                 oburindex                    ║

║ ES type name                  items                        ║

║ RecsModel                     all                          ║

║ Event names                   List(purchase, view)         ║

║ ══════════════════════════════════════════════════════════ ║

║ Random seed                   -1931119310                  ║

║ MaxCorrelatorsPerEventType    50                           ║

║ MaxEventsPerEventType         500                          ║

║ BlacklistEvents               List(purchase)               ║

║ ══════════════════════════════════════════════════════════ ║

║ User bias                     1.0                          ║

║ Item bias                     1.0                          ║

║ Max query events              100                          ║

║ Limit                         20                           ║

║ ══════════════════════════════════════════════════════════ ║

║ Rankings:                                                  ║

║ popular                       Some(popRank)                ║

╚════════════════════════════════════════════════════════════╝



[INFO] [Engine$] EngineWorkflow.train

[INFO] [Engine$] DataSource: com.actionml.DataSource@4953588a

[INFO] [Engine$] Preparator: com.actionml.Preparator@715d8f93

[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@50c15628)

[INFO] [Engine$] Data sanity check is on.

[WARN] [ApplicationMaster] Reporter thread fails 1 time(s) in a row.

[WARN] [ApplicationMaster] Reporter thread fails 2 time(s) in a row.

[WARN] [ApplicationMaster] Reporter thread fails 3 time(s) in a row.

[WARN] [ApplicationMaster] Reporter thread fails 4 time(s) in a row.

[INFO] [ServerConnector] Stopped Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:0}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a14c082{/api,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@10e699f8{/,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4462fa8{/static,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@78496d94{/executors,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,UNAVAILABLE,@Spark}

[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@e1518c9)

[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerJobEnd(0,1527077245287,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))





Also in stderr(?) this:

[Stage 0:>                                                          (0 + 0) / 5]



Yarn app info:

*User:*

pio <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>

*Name:*

org.apache.predictionio.workflow.CreateWorkflow

*Application Type:*

SPARK

*Application Tags:*

*Application Priority:*

0 (Higher Integer value indicates higher priority)

*YarnApplicationState:*

FINISHED

*Queue:*

default <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>

*FinalStatus Reported by AM:*

FAILED

*Started:*

Wed May 23 12:06:44 +0000 2018

*Elapsed:*

40sec

*Tracking URL:*

History <http://pio-cluster-m:8088/proxy/application_1526996273517_0030/>

*Log Aggregation Status:*

DISABLED

*Diagnostics:*

*Exception was thrown 5 time(s) from Reporter thread.*

*Unmanaged Application:*

false

*Application Node Label expression:*

<Not set>

*AM container Node Label expression:*

<DEFAULT_PARTITION>







Thanks,

Wojciech



*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *23 May 2018 11:26
*To: *Ambuj Sharma <am...@getamplify.com>; user@predictionio.apache.org
*Subject: *RE: Problem with training in yarn cluster



Hi,



Ok so full command now is:

pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g
--driver-memory 4g --deploy-mode cluster --master yarn



errors stopped after removing –executor-cores 2 --driver-cores 2

I found this error: Uncaught exception:
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request, requested virtual cores < 0, or requested virtual cores >
max configured, requestedVirtualCores=4, maxVirtualCores=2



But now I have problem with hbase :/



I have hbase host set:

declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"



[INFO] [Engine$] EngineWorkflow.train

[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e

[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4

[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)

[INFO] [Engine$] Data sanity check is on.

[ERROR] [StorageClient] HBase master is not running (ZooKeeper
ensemble: pio-cluster-m). Please make sure that HBase is running
properly, and that the configuration is pointing at the correct
ZooKeeper ensemble.

[ERROR] [Storage$] Error initializing storage client for source HBASE.

org.apache.hadoop.hbase.MasterNotRunningException:
com.google.protobuf.ServiceException: java.net.UnknownHostException:
unknown host: hbase-master

        at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)

        at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)

        at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

        at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)

        at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)

        at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)

        at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)

        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)

        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)

        at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)

        at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)

        at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)

        at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)

        at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)

        at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)

        at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)

        at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)

        at com.actionml.DataSource.readTraining(DataSource.scala:76)

        at com.actionml.DataSource.readTraining(DataSource.scala:48)

        at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)

        at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)

        at org.apache.predictionio.controller.Engine.train(Engine.scala:176)

        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)

        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)

        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

Caused by: com.google.protobuf.ServiceException:
java.net.UnknownHostException: unknown host: hbase-master

        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)

        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)

        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)

        ... 36 more

Caused by: java.net.UnknownHostException: unknown host: hbase-master

        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)

        at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)

        at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)

        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)

        ... 41 more







*From: *Ambuj Sharma <am...@getamplify.com>
*Sent: *23 May 2018 08:59
*To: *user@predictionio.apache.org
*Cc: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Subject: *Re: Problem with training in yarn cluster



Hi wojciech,

 I also faced many problems while setting yarn with PredictionIO. This may
be the case where yarn is tyring to findout pio.log file on hdfs cluster.
You can try "--master yarn --deploy-mode client ". you need to pass this
configuration with pio train

e.g., pio train -- --master yarn --deploy-mode client








Thanks and Regards

Ambuj Sharma

Sunrise may late, But Morning is sure.....

Team ML

Betaout



On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

Actually you might search the archives for “yarn” because I don’t recall
how the setup works off hand.



Archives here:
https://lists.apache.org/list.html?user@predictionio.apache.org



Also check the Spark Yarn requirements and remember that `pio train … --
various Spark params` allows you to pass arbitrary Spark params exactly as
you would to spark-submit on the pio command line. The double dash
separates PIO and Spark params.




From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
<wo...@tomandco.co.uk>


Subject:  RE: Problem with training in yarn cluster



What is the command line for `pio train …` Specifically are you using
yarn-cluster mode? This causes the driver code, which is a PIO process, to
be executed on an executor. Special setup is required for this.




From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster



Hello,



Actually I have another error in logs that is actually preventing train as
well:



[INFO] [RecommendationEngine$]



               _   _             __  __ _

     /\       | | (_)           |  \/  | |

    /  \   ___| |_ _  ___  _ __ | \  / | |

   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |

  / ____ \ (__| |_| | (_) | | | | |  | | |____

 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|







[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

[INFO] [log] Logging initialized @6774ms

[INFO] [Server] jetty-9.2.z-SNAPSHOT

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}

[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}

[INFO] [Server] Started @7040ms

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}

[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!

[ERROR] [ApplicationMaster] Uncaught exception:



Thanks,

Wojciech



*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *22 May 2018 23:20
*To: *user@predictionio.apache.org
*Subject: *Problem with training in yarn cluster



Hello, I am trying to setup distributed cluster with separate all services
but i have problem while running train:



log4j:ERROR setFile(null,true) call failed.

java.io.FileNotFoundException: /pio/pio.log (No such file or directory)

        at java.io.FileOutputStream.open0(Native Method)

        at java.io.FileOutputStream.open(FileOutputStream.java:270)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)

        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)

        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)

        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)

        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)

        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)

        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)

        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)

        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)

        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)

        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)

        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)

        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)

        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)

        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)

        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)

        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)





setup:

hbase

Hadoop

Hdfs

Spark cluster with yarn



Training in cluster mode

I assume spark worker is trying to save log to /pio/pio.log on worker
machine instead of pio host. How can I set pio destination to hdfs path ?



Or any other advice ?



Thanks,

Wojciech

RE: Problem with training in yarn cluster

Posted by Pat Ferrel <pa...@actionml.com>.
I noticed the appName is different for DataSource (“shop _live”) and
Algorithm (“shop_live”). AppNames must match.

Also the eventNames are different, which should be ok but it’s still a
question. Why input something that is not used? Given the meaning of the
events, I’d use them all for recommendations but you may eventually want to
create shopping cart and wishlist models separately since this will yield
“complimentary purchases” and “things you may be missing” in the wishlist.


From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 23, 2018 at 5:17:06 AM
To: Ambuj Sharma <am...@getamplify.com> <am...@getamplify.com>,
user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster

Hello again,



After moving hbase to dataproc cluster from docker ( probs dns/hostname
resolution issues ) no more hbase error but still training stops:



[INFO] [RecommendationEngine$]



               _   _             __  __ _

     /\       | | (_)           |  \/  | |

    /  \   ___| |_ _  ___  _ __ | \  / | |

   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |

  / ____ \ (__| |_| | (_) | | | | |  | | |____

 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|







[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

[INFO] [Engine] Datasource params: (,DataSourceParams(shop
_live,List(purchase, basket-add, wishlist-add, view),None,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

[INFO] [log] Logging initialized @10046ms

[INFO] [Server] jetty-9.2.z-SNAPSHOT

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark}

[INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349}

[INFO] [Server] Started @10430ms

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark}

[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!

[INFO] [DataSource]

╔════════════════════════════════════════════════════════════╗

║ Init DataSource                                            ║

║ ══════════════════════════════════════════════════════════ ║

║ App name                      shop _live             ║

║ Event window                  None                         ║

║ Event names                   List(purchase, basket-add, wishlist-add, view) ║

║ Min events per user           None                         ║

╚════════════════════════════════════════════════════════════╝



[INFO] [URAlgorithm]

╔════════════════════════════════════════════════════════════╗

║ Init URAlgorithm                                           ║

║ ══════════════════════════════════════════════════════════ ║

║ App name                      shop_live             ║

║ ES index name                 oburindex                    ║

║ ES type name                  items                        ║

║ RecsModel                     all                          ║

║ Event names                   List(purchase, view)         ║

║ ══════════════════════════════════════════════════════════ ║

║ Random seed                   -1931119310                  ║

║ MaxCorrelatorsPerEventType    50                           ║

║ MaxEventsPerEventType         500                          ║

║ BlacklistEvents               List(purchase)               ║

║ ══════════════════════════════════════════════════════════ ║

║ User bias                     1.0                          ║

║ Item bias                     1.0                          ║

║ Max query events              100                          ║

║ Limit                         20                           ║

║ ══════════════════════════════════════════════════════════ ║

║ Rankings:                                                  ║

║ popular                       Some(popRank)                ║

╚════════════════════════════════════════════════════════════╝



[INFO] [Engine$] EngineWorkflow.train

[INFO] [Engine$] DataSource: com.actionml.DataSource@4953588a

[INFO] [Engine$] Preparator: com.actionml.Preparator@715d8f93

[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@50c15628)

[INFO] [Engine$] Data sanity check is on.

[WARN] [ApplicationMaster] Reporter thread fails 1 time(s) in a row.

[WARN] [ApplicationMaster] Reporter thread fails 2 time(s) in a row.

[WARN] [ApplicationMaster] Reporter thread fails 3 time(s) in a row.

[WARN] [ApplicationMaster] Reporter thread fails 4 time(s) in a row.

[INFO] [ServerConnector] Stopped Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:0}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a14c082{/api,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@10e699f8{/,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4462fa8{/static,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@78496d94{/executors,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,UNAVAILABLE,@Spark}

[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,UNAVAILABLE,@Spark}

[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@e1518c9)

[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerJobEnd(0,1527077245287,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))





Also in stderr(?) this:

[Stage 0:>                                                          (0 + 0) / 5]



Yarn app info:

*User:*

pio <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>

*Name:*

org.apache.predictionio.workflow.CreateWorkflow

*Application Type:*

SPARK

*Application Tags:*

*Application Priority:*

0 (Higher Integer value indicates higher priority)

*YarnApplicationState:*

FINISHED

*Queue:*

default <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>

*FinalStatus Reported by AM:*

FAILED

*Started:*

Wed May 23 12:06:44 +0000 2018

*Elapsed:*

40sec

*Tracking URL:*

History <http://pio-cluster-m:8088/proxy/application_1526996273517_0030/>

*Log Aggregation Status:*

DISABLED

*Diagnostics:*

*Exception was thrown 5 time(s) from Reporter thread.*

*Unmanaged Application:*

false

*Application Node Label expression:*

<Not set>

*AM container Node Label expression:*

<DEFAULT_PARTITION>







Thanks,

Wojciech



*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *23 May 2018 11:26
*To: *Ambuj Sharma <am...@getamplify.com>; user@predictionio.apache.org
*Subject: *RE: Problem with training in yarn cluster



Hi,



Ok so full command now is:

pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g
--driver-memory 4g --deploy-mode cluster --master yarn



errors stopped after removing –executor-cores 2 --driver-cores 2

I found this error: Uncaught exception:
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request, requested virtual cores < 0, or requested virtual cores >
max configured, requestedVirtualCores=4, maxVirtualCores=2



But now I have problem with hbase :/



I have hbase host set:

declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"



[INFO] [Engine$] EngineWorkflow.train

[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e

[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4

[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)

[INFO] [Engine$] Data sanity check is on.

[ERROR] [StorageClient] HBase master is not running (ZooKeeper
ensemble: pio-cluster-m). Please make sure that HBase is running
properly, and that the configuration is pointing at the correct
ZooKeeper ensemble.

[ERROR] [Storage$] Error initializing storage client for source HBASE.

org.apache.hadoop.hbase.MasterNotRunningException:
com.google.protobuf.ServiceException: java.net.UnknownHostException:
unknown host: hbase-master

        at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)

        at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)

        at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

        at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)

        at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)

        at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)

        at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)

        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)

        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)

        at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)

        at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)

        at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)

        at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)

        at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)

        at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)

        at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)

        at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)

        at com.actionml.DataSource.readTraining(DataSource.scala:76)

        at com.actionml.DataSource.readTraining(DataSource.scala:48)

        at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)

        at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)

        at org.apache.predictionio.controller.Engine.train(Engine.scala:176)

        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)

        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)

        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

Caused by: com.google.protobuf.ServiceException:
java.net.UnknownHostException: unknown host: hbase-master

        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)

        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)

        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)

        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)

        ... 36 more

Caused by: java.net.UnknownHostException: unknown host: hbase-master

        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)

        at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)

        at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)

        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)

        ... 41 more







*From: *Ambuj Sharma <am...@getamplify.com>
*Sent: *23 May 2018 08:59
*To: *user@predictionio.apache.org
*Cc: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Subject: *Re: Problem with training in yarn cluster



Hi wojciech,

 I also faced many problems while setting yarn with PredictionIO. This may
be the case where yarn is tyring to findout pio.log file on hdfs cluster.
You can try "--master yarn --deploy-mode client ". you need to pass this
configuration with pio train

e.g., pio train -- --master yarn --deploy-mode client








Thanks and Regards

Ambuj Sharma

Sunrise may late, But Morning is sure.....

Team ML

Betaout



On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

Actually you might search the archives for “yarn” because I don’t recall
how the setup works off hand.



Archives here:
https://lists.apache.org/list.html?user@predictionio.apache.org



Also check the Spark Yarn requirements and remember that `pio train … --
various Spark params` allows you to pass arbitrary Spark params exactly as
you would to spark-submit on the pio command line. The double dash
separates PIO and Spark params.




From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
<wo...@tomandco.co.uk>


Subject:  RE: Problem with training in yarn cluster



What is the command line for `pio train …` Specifically are you using
yarn-cluster mode? This causes the driver code, which is a PIO process, to
be executed on an executor. Special setup is required for this.




From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster



Hello,



Actually I have another error in logs that is actually preventing train as
well:



[INFO] [RecommendationEngine$]



               _   _             __  __ _

     /\       | | (_)           |  \/  | |

    /  \   ___| |_ _  ___  _ __ | \  / | |

   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |

  / ____ \ (__| |_| | (_) | | | | |  | | |____

 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|







[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

[INFO] [log] Logging initialized @6774ms

[INFO] [Server] jetty-9.2.z-SNAPSHOT

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}

[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}

[INFO] [Server] Started @7040ms

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}

[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!

[ERROR] [ApplicationMaster] Uncaught exception:



Thanks,

Wojciech



*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *22 May 2018 23:20
*To: *user@predictionio.apache.org
*Subject: *Problem with training in yarn cluster



Hello, I am trying to setup distributed cluster with separate all services
but i have problem while running train:



log4j:ERROR setFile(null,true) call failed.

java.io.FileNotFoundException: /pio/pio.log (No such file or directory)

        at java.io.FileOutputStream.open0(Native Method)

        at java.io.FileOutputStream.open(FileOutputStream.java:270)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)

        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)

        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)

        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)

        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)

        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)

        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)

        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)

        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)

        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)

        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)

        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)

        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)

        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)

        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)

        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)

        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)





setup:

hbase

Hadoop

Hdfs

Spark cluster with yarn



Training in cluster mode

I assume spark worker is trying to save log to /pio/pio.log on worker
machine instead of pio host. How can I set pio destination to hdfs path ?



Or any other advice ?



Thanks,

Wojciech

RE: Problem with training in yarn cluster

Posted by Wojciech Kowalski <wo...@tomandco.co.uk>.
Hello again,

After moving hbase to dataproc cluster from docker ( probs dns/hostname resolution issues ) no more hbase error but still training stops:

[INFO] [RecommendationEngine$] 

               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|


      
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop _live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @10046ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349}
[INFO] [Server] Started @10430ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[INFO] [DataSource] 
╔════════════════════════════════════════════════════════════╗
║ Init DataSource                                            ║
║ ══════════════════════════════════════════════════════════ ║
║ App name                      shop _live             ║
║ Event window                  None                         ║
║ Event names                   List(purchase, basket-add, wishlist-add, view) ║
║ Min events per user           None                         ║
╚════════════════════════════════════════════════════════════╝

[INFO] [URAlgorithm] 
╔════════════════════════════════════════════════════════════╗
║ Init URAlgorithm                                           ║
║ ══════════════════════════════════════════════════════════ ║
║ App name                      shop_live             ║
║ ES index name                 oburindex                    ║
║ ES type name                  items                        ║
║ RecsModel                     all                          ║
║ Event names                   List(purchase, view)         ║
║ ══════════════════════════════════════════════════════════ ║
║ Random seed                   -1931119310                  ║
║ MaxCorrelatorsPerEventType    50                           ║
║ MaxEventsPerEventType         500                          ║
║ BlacklistEvents               List(purchase)               ║
║ ══════════════════════════════════════════════════════════ ║
║ User bias                     1.0                          ║
║ Item bias                     1.0                          ║
║ Max query events              100                          ║
║ Limit                         20                           ║
║ ══════════════════════════════════════════════════════════ ║
║ Rankings:                                                  ║
║ popular                       Some(popRank)                ║
╚════════════════════════════════════════════════════════════╝

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@4953588a
[INFO] [Engine$] Preparator: com.actionml.Preparator@715d8f93
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@50c15628)
[INFO] [Engine$] Data sanity check is on.
[WARN] [ApplicationMaster] Reporter thread fails 1 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 2 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 3 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 4 time(s) in a row.
[INFO] [ServerConnector] Stopped Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:0}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7a14c082{/api,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@10e699f8{/,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@4462fa8{/static,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@78496d94{/executors,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@724eae6a{/stages,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,UNAVAILABLE,@Spark}
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@e1518c9)
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1527077245287,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))


Also in stderr(?) this:
[Stage 0:>                                                          (0 + 0) / 5]

Yarn app info:
User:
pio
Name:
org.apache.predictionio.workflow.CreateWorkflow
Application Type:
SPARK
Application Tags:

Application Priority:
0 (Higher Integer value indicates higher priority)
YarnApplicationState:
FINISHED
Queue:
default
FinalStatus Reported by AM:
FAILED
Started:
Wed May 23 12:06:44 +0000 2018
Elapsed:
40sec
Tracking URL:
History
Log Aggregation Status:
DISABLED
Diagnostics:
Exception was thrown 5 time(s) from Reporter thread.
Unmanaged Application:
false
Application Node Label expression:
<Not set>
AM container Node Label expression:
<DEFAULT_PARTITION>


Thanks,
Wojciech

From: Wojciech Kowalski
Sent: 23 May 2018 11:26
To: Ambuj Sharma; user@predictionio.apache.org
Subject: RE: Problem with training in yarn cluster

Hi,

Ok so full command now is:
pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g --driver-memory 4g --deploy-mode cluster --master yarn

errors stopped after removing –executor-cores 2 --driver-cores 2
I found this error: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=4, maxVirtualCores=2

But now I have problem with hbase :/

I have hbase host set: 
declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)
[INFO] [Engine$] Data sanity check is on.
[ERROR] [StorageClient] HBase master is not running (ZooKeeper ensemble: pio-cluster-m). Please make sure that HBase is running properly, and that the configuration is pointing at the correct ZooKeeper ensemble.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
        at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)
        at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)
        at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
        at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
        at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
        at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
        at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)
        at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)
        at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)
        at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)
        at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)
        at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)
        at com.actionml.DataSource.readTraining(DataSource.scala:76)
        at com.actionml.DataSource.readTraining(DataSource.scala:48)
        at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)
        at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)
        at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)
        ... 36 more
Caused by: java.net.UnknownHostException: unknown host: hbase-master
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
        at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
        at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
        ... 41 more



From: Ambuj Sharma
Sent: 23 May 2018 08:59
To: user@predictionio.apache.org
Cc: Wojciech Kowalski
Subject: Re: Problem with training in yarn cluster

Hi wojciech,
 I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client





Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout

On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Actually you might search the archives for “yarn” because I don’t recall how the setup works off hand.

Archives here: https://lists.apache.org/list.html?user@predictionio.apache.org

Also check the Spark Yarn requirements and remember that `pio train … -- various Spark params` allows you to pass arbitrary Spark params exactly as you would to spark-submit on the pio command line. The double dash separates PIO and Spark params. 


From: Pat Ferrel <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>

Subject:  RE: Problem with training in yarn cluster 

What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this.


From: Wojciech Kowalski <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster 

Hello,
 
Actually I have another error in logs that is actually preventing train as well:
 
[INFO] [RecommendationEngine$] 
 
               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|
 
 
      
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception: 
 
Thanks,
Wojciech
 
From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster
 
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
 
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 
 
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
 
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
 
Or any other advice ?
 
Thanks,
Wojciech
 




RE: Problem with training in yarn cluster

Posted by Wojciech Kowalski <wo...@tomandco.co.uk>.
Hi,

Ok so full command now is:
pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g --driver-memory 4g --deploy-mode cluster --master yarn

errors stopped after removing –executor-cores 2 --driver-cores 2
I found this error: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=4, maxVirtualCores=2

But now I have problem with hbase :/

I have hbase host set: 
declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)
[INFO] [Engine$] Data sanity check is on.
[ERROR] [StorageClient] HBase master is not running (ZooKeeper ensemble: pio-cluster-m). Please make sure that HBase is running properly, and that the configuration is pointing at the correct ZooKeeper ensemble.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
	at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)
	at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)
	at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
	at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
	at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
	at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
	at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
	at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
	at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
	at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
	at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)
	at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)
	at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)
	at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)
	at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)
	at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)
	at com.actionml.DataSource.readTraining(DataSource.scala:76)
	at com.actionml.DataSource.readTraining(DataSource.scala:48)
	at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)
	at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)
	at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
	at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
	at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
	at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)
	... 36 more
Caused by: java.net.UnknownHostException: unknown host: hbase-master
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
	at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	... 41 more



From: Ambuj Sharma
Sent: 23 May 2018 08:59
To: user@predictionio.apache.org
Cc: Wojciech Kowalski
Subject: Re: Problem with training in yarn cluster

Hi wojciech,
 I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client





Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout

On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Actually you might search the archives for “yarn” because I don’t recall how the setup works off hand.

Archives here: https://lists.apache.org/list.html?user@predictionio.apache.org

Also check the Spark Yarn requirements and remember that `pio train … -- various Spark params` allows you to pass arbitrary Spark params exactly as you would to spark-submit on the pio command line. The double dash separates PIO and Spark params. 


From: Pat Ferrel <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>

Subject:  RE: Problem with training in yarn cluster 


What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this.


From: Wojciech Kowalski <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster 


Hello,
 
Actually I have another error in logs that is actually preventing train as well:
 
[INFO] [RecommendationEngine$] 
 
               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|
 
 
      
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception: 
 
Thanks,
Wojciech
 
From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster
 
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
 
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 
 
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
 
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
 
Or any other advice ?
 
Thanks,
Wojciech
 



Re: Problem with training in yarn cluster

Posted by Ambuj Sharma <am...@getamplify.com>.
Hi wojciech,
 I also faced many problems while setting yarn with PredictionIO. This may
be the case where yarn is tyring to findout pio.log file on hdfs cluster.
You can try "--master yarn --deploy-mode client ". you need to pass this
configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client




Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout

On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Actually you might search the archives for “yarn” because I don’t recall
> how the setup works off hand.
>
> Archives here: https://lists.apache.org/list.html?user@
> predictionio.apache.org
>
> Also check the Spark Yarn requirements and remember that `pio train … --
> various Spark params` allows you to pass arbitrary Spark params exactly as
> you would to spark-submit on the pio command line. The double dash
> separates PIO and Spark params.
>
>
> From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 22, 2018 at 4:07:38 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>, Wojciech Kowalski
> <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
>
> Subject:  RE: Problem with training in yarn cluster
>
> What is the command line for `pio train …` Specifically are you using
> yarn-cluster mode? This causes the driver code, which is a PIO process, to
> be executed on an executor. Special setup is required for this.
>
>
> From: Wojciech Kowalski <wo...@tomandco.co.uk>
> <wo...@tomandco.co.uk>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 22, 2018 at 2:28:43 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  RE: Problem with training in yarn cluster
>
> Hello,
>
>
>
> Actually I have another error in logs that is actually preventing train as
> well:
>
>
>
> [INFO] [RecommendationEngine$]
>
>
>
>                _   _             __  __ _
>
>      /\       | | (_)           |  \/  | |
>
>     /  \   ___| |_ _  ___  _ __ | \  / | |
>
>    / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
>
>   / ____ \ (__| |_| | (_) | | | | |  | | |____
>
>  /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|
>
>
>
>
>
>
>
> [INFO] [Engine] Extracting datasource params...
>
> [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
>
> [INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
>
> [INFO] [Engine] Extracting preparator params...
>
> [INFO] [Engine] Preparator params: (,Empty)
>
> [INFO] [Engine] Extracting serving params...
>
> [INFO] [Engine] Serving params: (,Empty)
>
> [INFO] [log] Logging initialized @6774ms
>
> [INFO] [Server] jetty-9.2.z-SNAPSHOT
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
>
> [INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
>
> [INFO] [Server] Started @7040ms
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
>
> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
>
> [ERROR] [ApplicationMaster] Uncaught exception:
>
>
>
> Thanks,
>
> Wojciech
>
>
>
> *From: *Wojciech Kowalski <wo...@tomandco.co.uk>
> *Sent: *22 May 2018 23:20
> *To: *user@predictionio.apache.org
> *Subject: *Problem with training in yarn cluster
>
>
>
> Hello, I am trying to setup distributed cluster with separate all services
> but i have problem while running train:
>
>
>
> log4j:ERROR setFile(null,true) call failed.
>
> java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
>
>         at java.io.FileOutputStream.open0(Native Method)
>
>         at java.io.FileOutputStream.open(FileOutputStream.java:270)
>
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
>
>         at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
>
>         at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
>
>         at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
>
>         at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
>
>         at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
>
>         at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
>
>         at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
>
>         at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
>
>         at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
>
>         at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
>
>         at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>
>         at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>
>         at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
>
>         at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
>
>         at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
>
>         at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
>
>         at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
>
>         at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
>
>         at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>
>
>
>
>
> setup:
>
> hbase
>
> Hadoop
>
> Hdfs
>
> Spark cluster with yarn
>
>
>
> Training in cluster mode
>
> I assume spark worker is trying to save log to /pio/pio.log on worker
> machine instead of pio host. How can I set pio destination to hdfs path ?
>
>
>
> Or any other advice ?
>
>
>
> Thanks,
>
> Wojciech
>
>
>
>

RE: Problem with training in yarn cluster

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Actually you might search the archives for “yarn” because I don’t recall
how the setup works off hand.

Archives here:
https://lists.apache.org/list.html?user@predictionio.apache.org

Also check the Spark Yarn requirements and remember that `pio train … --
various Spark params` allows you to pass arbitrary Spark params exactly as
you would to spark-submit on the pio command line. The double dash
separates PIO and Spark params.


From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
<wo...@tomandco.co.uk>
Subject:  RE: Problem with training in yarn cluster

What is the command line for `pio train …` Specifically are you using
yarn-cluster mode? This causes the driver code, which is a PIO process, to
be executed on an executor. Special setup is required for this.


From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster

Hello,



Actually I have another error in logs that is actually preventing train as
well:



[INFO] [RecommendationEngine$]



               _   _             __  __ _

     /\       | | (_)           |  \/  | |

    /  \   ___| |_ _  ___  _ __ | \  / | |

   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |

  / ____ \ (__| |_| | (_) | | | | |  | | |____

 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|







[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

[INFO] [log] Logging initialized @6774ms

[INFO] [Server] jetty-9.2.z-SNAPSHOT

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}

[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}

[INFO] [Server] Started @7040ms

[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}

[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!

[ERROR] [ApplicationMaster] Uncaught exception:



Thanks,

Wojciech



*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *22 May 2018 23:20
*To: *user@predictionio.apache.org
*Subject: *Problem with training in yarn cluster



Hello, I am trying to setup distributed cluster with separate all services
but i have problem while running train:



log4j:ERROR setFile(null,true) call failed.

java.io.FileNotFoundException: /pio/pio.log (No such file or directory)

        at java.io.FileOutputStream.open0(Native Method)

        at java.io.FileOutputStream.open(FileOutputStream.java:270)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)

        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)

        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)

        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)

        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)

        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)

        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)

        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)

        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)

        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)

        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)

        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)

        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)

        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)

        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)

        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)

        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)

        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)





setup:

hbase

Hadoop

Hdfs

Spark cluster with yarn



Training in cluster mode

I assume spark worker is trying to save log to /pio/pio.log on worker
machine instead of pio host. How can I set pio destination to hdfs path ?



Or any other advice ?



Thanks,

Wojciech

RE: Problem with training in yarn cluster

Posted by Pat Ferrel <pa...@occamsmachete.com>.
What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this.


From: Wojciech Kowalski <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
Subject:  RE: Problem with training in yarn cluster  

Hello,

 

Actually I have another error in logs that is actually preventing train as well:

 

[INFO] [RecommendationEngine$]  
 
               _   _             __  __ _
     /\       | | (_)           |  \/  | |
    /  \   ___| |_ _  ___  _ __ | \  / | |
   / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
  / ____ \ (__| |_| | (_) | | | | |  | | |____
 /_/    \_\___|\__|_|\___/|_| |_|_|  |_|______|
 
 
       
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:  
 

Thanks,

Wojciech

 

From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster

 

Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:

 

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
        at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
        at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
        at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
        at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
        at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
        at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
        at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
        at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
        at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 

 

setup:

hbase

Hadoop

Hdfs

Spark cluster with yarn

 

Training in cluster mode

I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?

 

Or any other advice ?

 

Thanks,

Wojciech