You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Wojciech Kowalski <wo...@tomandco.co.uk> on 2018/05/22 21:20:28 UTC
Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
RE: Problem with training in yarn cluster
Posted by Wojciech Kowalski <wo...@tomandco.co.uk>.
Hello,
Actually I have another error in logs that is actually preventing train as well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
RE: Problem with training in yarn cluster
Posted by Pat Ferrel <pa...@actionml.com>.
Check the Spark GUI is has a good way to browse logs on and in workers.
Don’t know your data but 4g is pretty small. How many executors in your
cluster? Also why do you need yarn, it adds a fair bit of complexity.
From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Date: May 23, 2018 at 6:07:47 AM
To: Pat Ferrel <pa...@actionml.com> <pa...@actionml.com>, Ambuj Sharma
<am...@getamplify.com> <am...@getamplify.com>, user@predictionio.apache.org
<us...@predictionio.apache.org> <us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
I just replaced name of the shop and space was added there. I don’t think
it’s anything to do with events names as normal training simply goes
through.
Hmm i can see that without using yarn it stopped working now as well :/
Some timeout but i have no idea where it can’t connect to, no info about
any host etc.
pio train -- --executor-memory 4g --driver-memory 4g --verbose --master
local
[INFO] [Runner$] Submission command: /pio/vendors/spark/bin/spark-submit
--executor-memory 4g --driver-memory 4g --verbose --master local --class
org.apache.predictionio.workflow.CreateWorkflow --jars
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar,file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar,file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar
--files
file:/pio/conf/log4j.properties,file:/pio/vendors/hadoop/conf/core-site.xml,file:/pio/vendors/hbase/conf/hbase-site.xml
--driver-class-path
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf
--driver-java-options -Dpio.log.dir=/pio
file:/pio/lib/pio-assembly-0.12.1.jar --engine-id
com.actionml.RecommendationEngine --engine-version
59b11bc973a65a09bc1d1dee3db026e5c906e4f9 --engine-variant
file:/pio/engines/ob-ur-live/engine.json --verbosity 0 --json-extractor
Both --env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=pio-cluster-m,PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.12.1,PIO_FS_BASEDIR=/pio/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=pio-gc,PIO_HOME=/pio,PIO_FS_ENGINESDIR=/pio/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/pio/.pio_store/models,PIO_STORAGE_SOURCES_HBASE_PORTS=16020,PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs,PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://pio-cluster-m/pio/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/vendors/elasticsearch,PIO_FS_TMPDIR=/pio/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCSE_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
Using properties file: null
Parsed arguments:
master local
deployMode null
executorMemory 4g
executorCores null
totalExecutorCores null
propertiesFile null
driverMemory 4g
driverCores null
driverExtraClassPath
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf
driverExtraLibraryPath null
driverExtraJavaOptions -Dpio.log.dir=/pio
supervise false
queue null
numExecutors null
files
file:/pio/conf/log4j.properties,file:/pio/vendors/hadoop/conf/core-site.xml,file:/pio/vendors/hbase/conf/hbase-site.xml
pyFiles null
archives null
mainClass org.apache.predictionio.workflow.CreateWorkflow
primaryResource file:/pio/lib/pio-assembly-0.12.1.jar
name org.apache.predictionio.workflow.CreateWorkflow
childArgs [--engine-id com.actionml.RecommendationEngine
--engine-version 59b11bc973a65a09bc1d1dee3db026e5c906e4f9 --engine-variant
file:/pio/engines/ob-ur-live/engine.json --verbosity 0 --json-extractor
Both --env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=pio-cluster-m,PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.12.1,PIO_FS_BASEDIR=/pio/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=pio-gc,PIO_HOME=/pio,PIO_FS_ENGINESDIR=/pio/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/pio/.pio_store/models,PIO_STORAGE_SOURCES_HBASE_PORTS=16020,PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs,PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://pio-cluster-m/pio/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/vendors/elasticsearch,PIO_FS_TMPDIR=/pio/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCSE_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs]
jars
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar,file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar,file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file null:
spark.driver.memory -> 4g
spark.driver.extraJavaOptions -> -Dpio.log.dir=/pio
spark.driver.extraClassPath ->
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf
Main class:
org.apache.predictionio.workflow.CreateWorkflow
Arguments:
--engine-id
com.actionml.RecommendationEngine
--engine-version
59b11bc973a65a09bc1d1dee3db026e5c906e4f9
--engine-variant
file:/pio/engines/ob-ur-live/engine.json
--verbosity
0
--json-extractor
Both
--env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=pio-cluster-m,PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_VERSION=0.12.1,PIO_FS_BASEDIR=/pio/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=pio-gc,PIO_HOME=/pio,PIO_FS_ENGINESDIR=/pio/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/pio/.pio_store/models,PIO_STORAGE_SOURCES_HBASE_PORTS=16020,PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs,PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://pio-cluster-m/pio/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/vendors/elasticsearch,PIO_FS_TMPDIR=/pio/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCSE_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
System properties:
spark.driver.memory -> 4g
SPARK_SUBMIT -> true
spark.files ->
file:/pio/conf/log4j.properties,file:/pio/vendors/hadoop/conf/core-site.xml,file:/pio/vendors/hbase/conf/hbase-site.xml
spark.app.name -> org.apache.predictionio.workflow.CreateWorkflow
spark.driver.extraJavaOptions -> -Dpio.log.dir=/pio
spark.jars ->
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar,file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar,file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar,file:/pio/lib/pio-assembly-0.12.1.jar
spark.submit.deployMode -> client
spark.master -> local
spark.driver.extraClassPath ->
/pio/conf:/pio/vendors/hadoop/conf:/pio/vendors/hbase/conf
Classpath elements:
file:/pio/lib/pio-assembly-0.12.1.jar
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender-assembly-0.7.2-deps.jar
file:/pio/engines/ob-ur-live/target/scala-2.11/universal-recommender_2.11-0.7.2.jar
file:/pio/lib/spark/pio-data-s3-assembly-0.12.1.jar
file:/pio/lib/spark/pio-data-hdfs-assembly-0.12.1.jar
file:/pio/lib/spark/pio-data-localfs-assembly-0.12.1.jar
file:/pio/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar
file:/pio/lib/spark/pio-data-hbase-assembly-0.12.1.jar
file:/pio/lib/spark/pio-data-jdbc-assembly-0.12.1.jar
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be
used.
[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
Exception in thread "main" java.net.ConnectException
at
org.apache.predictionio.shaded.org.apache.http.nio.pool.RouteSpecificPool.timeout(RouteSpecificPool.java:168)
at
org.apache.predictionio.shaded.org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:561)
at
org.apache.predictionio.shaded.org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:822)
at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:183)
at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:210)
at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:155)
at
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
at
org.apache.predictionio.shaded.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
at
org.apache.predictionio.shaded.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.run(Thread.java:748)
Thanks,
Wojciech
*From: *Pat Ferrel <pa...@actionml.com>
*Sent: *23 May 2018 14:28
*To: *Ambuj Sharma <am...@getamplify.com>;
user@predictionio.apache.org; Wojciech
Kowalski <wo...@tomandco.co.uk>
*Subject: *RE: Problem with training in yarn cluster
I noticed the appName is different for DataSource (“shop _live”) and
Algorithm (“shop_live”). AppNames must match.
Also the eventNames are different, which should be ok but it’s still a
question. Why input something that is not used? Given the meaning of the
events, I’d use them all for recommendations but you may eventually want to
create shopping cart and wishlist models separately since this will yield
“complimentary purchases” and “things you may be missing” in the wishlist.
From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 23, 2018 at 5:17:06 AM
To: Ambuj Sharma <am...@getamplify.com> <am...@getamplify.com>,
user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello again,
After moving hbase to dataproc cluster from docker ( probs dns/hostname
resolution issues ) no more hbase error but still training stops:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop
_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @10046ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349}
[INFO] [Server] Started @10430ms
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!
[INFO] [DataSource]
╔════════════════════════════════════════════════════════════╗
║ Init DataSource ║
║ ══════════════════════════════════════════════════════════ ║
║ App name shop _live ║
║ Event window None ║
║ Event names List(purchase, basket-add, wishlist-add, view) ║
║ Min events per user None ║
╚════════════════════════════════════════════════════════════╝
[INFO] [URAlgorithm]
╔════════════════════════════════════════════════════════════╗
║ Init URAlgorithm ║
║ ══════════════════════════════════════════════════════════ ║
║ App name shop_live ║
║ ES index name oburindex ║
║ ES type name items ║
║ RecsModel all ║
║ Event names List(purchase, view) ║
║ ══════════════════════════════════════════════════════════ ║
║ Random seed -1931119310 ║
║ MaxCorrelatorsPerEventType 50 ║
║ MaxEventsPerEventType 500 ║
║ BlacklistEvents List(purchase) ║
║ ══════════════════════════════════════════════════════════ ║
║ User bias 1.0 ║
║ Item bias 1.0 ║
║ Max query events 100 ║
║ Limit 20 ║
║ ══════════════════════════════════════════════════════════ ║
║ Rankings: ║
║ popular Some(popRank) ║
╚════════════════════════════════════════════════════════════╝
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@4953588a
[INFO] [Engine$] Preparator: com.actionml.Preparator@715d8f93
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@50c15628)
[INFO] [Engine$] Data sanity check is on.
[WARN] [ApplicationMaster] Reporter thread fails 1 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 2 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 3 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 4 time(s) in a row.
[INFO] [ServerConnector] Stopped Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:0}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a14c082{/api,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@10e699f8{/,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4462fa8{/static,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@78496d94{/executors,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,UNAVAILABLE,@Spark}
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@e1518c9)
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerJobEnd(0,1527077245287,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))
Also in stderr(?) this:
[Stage 0:> (0 + 0) / 5]
Yarn app info:
*User:*
pio <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>
*Name:*
org.apache.predictionio.workflow.CreateWorkflow
*Application Type:*
SPARK
*Application Tags:*
*Application Priority:*
0 (Higher Integer value indicates higher priority)
*YarnApplicationState:*
FINISHED
*Queue:*
default <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>
*FinalStatus Reported by AM:*
FAILED
*Started:*
Wed May 23 12:06:44 +0000 2018
*Elapsed:*
40sec
*Tracking URL:*
History <http://pio-cluster-m:8088/proxy/application_1526996273517_0030/>
*Log Aggregation Status:*
DISABLED
*Diagnostics:*
*Exception was thrown 5 time(s) from Reporter thread.*
*Unmanaged Application:*
false
*Application Node Label expression:*
<Not set>
*AM container Node Label expression:*
<DEFAULT_PARTITION>
Thanks,
Wojciech
*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *23 May 2018 11:26
*To: *Ambuj Sharma <am...@getamplify.com>; user@predictionio.apache.org
*Subject: *RE: Problem with training in yarn cluster
Hi,
Ok so full command now is:
pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g
--driver-memory 4g --deploy-mode cluster --master yarn
errors stopped after removing –executor-cores 2 --driver-cores 2
I found this error: Uncaught exception:
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request, requested virtual cores < 0, or requested virtual cores >
max configured, requestedVirtualCores=4, maxVirtualCores=2
But now I have problem with hbase :/
I have hbase host set:
declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)
[INFO] [Engine$] Data sanity check is on.
[ERROR] [StorageClient] HBase master is not running (ZooKeeper
ensemble: pio-cluster-m). Please make sure that HBase is running
properly, and that the configuration is pointing at the correct
ZooKeeper ensemble.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.MasterNotRunningException:
com.google.protobuf.ServiceException: java.net.UnknownHostException:
unknown host: hbase-master
at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)
at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)
at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)
at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)
at com.actionml.DataSource.readTraining(DataSource.scala:76)
at com.actionml.DataSource.readTraining(DataSource.scala:48)
at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: com.google.protobuf.ServiceException:
java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)
... 36 more
Caused by: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
... 41 more
*From: *Ambuj Sharma <am...@getamplify.com>
*Sent: *23 May 2018 08:59
*To: *user@predictionio.apache.org
*Cc: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Subject: *Re: Problem with training in yarn cluster
Hi wojciech,
I also faced many problems while setting yarn with PredictionIO. This may
be the case where yarn is tyring to findout pio.log file on hdfs cluster.
You can try "--master yarn --deploy-mode client ". you need to pass this
configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client
Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout
On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Actually you might search the archives for “yarn” because I don’t recall
how the setup works off hand.
Archives here:
https://lists.apache.org/list.html?user@predictionio.apache.org
Also check the Spark Yarn requirements and remember that `pio train … --
various Spark params` allows you to pass arbitrary Spark params exactly as
you would to spark-submit on the pio command line. The double dash
separates PIO and Spark params.
From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
<wo...@tomandco.co.uk>
Subject: RE: Problem with training in yarn cluster
What is the command line for `pio train …` Specifically are you using
yarn-cluster mode? This causes the driver code, which is a PIO process, to
be executed on an executor. Special setup is required for this.
From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello,
Actually I have another error in logs that is actually preventing train as
well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *22 May 2018 23:20
*To: *user@predictionio.apache.org
*Subject: *Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services
but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker
machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
RE: Problem with training in yarn cluster
Posted by Pat Ferrel <pa...@actionml.com>.
I noticed the appName is different for DataSource (“shop _live”) and
Algorithm (“shop_live”). AppNames must match.
Also the eventNames are different, which should be ok but it’s still a
question. Why input something that is not used? Given the meaning of the
events, I’d use them all for recommendations but you may eventually want to
create shopping cart and wishlist models separately since this will yield
“complimentary purchases” and “things you may be missing” in the wishlist.
From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 23, 2018 at 5:17:06 AM
To: Ambuj Sharma <am...@getamplify.com> <am...@getamplify.com>,
user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello again,
After moving hbase to dataproc cluster from docker ( probs dns/hostname
resolution issues ) no more hbase error but still training stops:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop
_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @10046ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349}
[INFO] [Server] Started @10430ms
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!
[INFO] [DataSource]
╔════════════════════════════════════════════════════════════╗
║ Init DataSource ║
║ ══════════════════════════════════════════════════════════ ║
║ App name shop _live ║
║ Event window None ║
║ Event names List(purchase, basket-add, wishlist-add, view) ║
║ Min events per user None ║
╚════════════════════════════════════════════════════════════╝
[INFO] [URAlgorithm]
╔════════════════════════════════════════════════════════════╗
║ Init URAlgorithm ║
║ ══════════════════════════════════════════════════════════ ║
║ App name shop_live ║
║ ES index name oburindex ║
║ ES type name items ║
║ RecsModel all ║
║ Event names List(purchase, view) ║
║ ══════════════════════════════════════════════════════════ ║
║ Random seed -1931119310 ║
║ MaxCorrelatorsPerEventType 50 ║
║ MaxEventsPerEventType 500 ║
║ BlacklistEvents List(purchase) ║
║ ══════════════════════════════════════════════════════════ ║
║ User bias 1.0 ║
║ Item bias 1.0 ║
║ Max query events 100 ║
║ Limit 20 ║
║ ══════════════════════════════════════════════════════════ ║
║ Rankings: ║
║ popular Some(popRank) ║
╚════════════════════════════════════════════════════════════╝
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@4953588a
[INFO] [Engine$] Preparator: com.actionml.Preparator@715d8f93
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@50c15628)
[INFO] [Engine$] Data sanity check is on.
[WARN] [ApplicationMaster] Reporter thread fails 1 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 2 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 3 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 4 time(s) in a row.
[INFO] [ServerConnector] Stopped Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:0}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a14c082{/api,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@10e699f8{/,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@4462fa8{/static,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@78496d94{/executors,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@724eae6a{/stages,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped
o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,UNAVAILABLE,@Spark}
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@e1518c9)
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped!
Dropping event SparkListenerJobEnd(0,1527077245287,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))
Also in stderr(?) this:
[Stage 0:> (0 + 0) / 5]
Yarn app info:
*User:*
pio <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>
*Name:*
org.apache.predictionio.workflow.CreateWorkflow
*Application Type:*
SPARK
*Application Tags:*
*Application Priority:*
0 (Higher Integer value indicates higher priority)
*YarnApplicationState:*
FINISHED
*Queue:*
default <http://pio-cluster-m:8088/cluster/scheduler?openQueues=default>
*FinalStatus Reported by AM:*
FAILED
*Started:*
Wed May 23 12:06:44 +0000 2018
*Elapsed:*
40sec
*Tracking URL:*
History <http://pio-cluster-m:8088/proxy/application_1526996273517_0030/>
*Log Aggregation Status:*
DISABLED
*Diagnostics:*
*Exception was thrown 5 time(s) from Reporter thread.*
*Unmanaged Application:*
false
*Application Node Label expression:*
<Not set>
*AM container Node Label expression:*
<DEFAULT_PARTITION>
Thanks,
Wojciech
*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *23 May 2018 11:26
*To: *Ambuj Sharma <am...@getamplify.com>; user@predictionio.apache.org
*Subject: *RE: Problem with training in yarn cluster
Hi,
Ok so full command now is:
pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g
--driver-memory 4g --deploy-mode cluster --master yarn
errors stopped after removing –executor-cores 2 --driver-cores 2
I found this error: Uncaught exception:
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request, requested virtual cores < 0, or requested virtual cores >
max configured, requestedVirtualCores=4, maxVirtualCores=2
But now I have problem with hbase :/
I have hbase host set:
declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)
[INFO] [Engine$] Data sanity check is on.
[ERROR] [StorageClient] HBase master is not running (ZooKeeper
ensemble: pio-cluster-m). Please make sure that HBase is running
properly, and that the configuration is pointing at the correct
ZooKeeper ensemble.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.MasterNotRunningException:
com.google.protobuf.ServiceException: java.net.UnknownHostException:
unknown host: hbase-master
at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)
at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)
at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)
at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)
at com.actionml.DataSource.readTraining(DataSource.scala:76)
at com.actionml.DataSource.readTraining(DataSource.scala:48)
at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: com.google.protobuf.ServiceException:
java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)
... 36 more
Caused by: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
... 41 more
*From: *Ambuj Sharma <am...@getamplify.com>
*Sent: *23 May 2018 08:59
*To: *user@predictionio.apache.org
*Cc: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Subject: *Re: Problem with training in yarn cluster
Hi wojciech,
I also faced many problems while setting yarn with PredictionIO. This may
be the case where yarn is tyring to findout pio.log file on hdfs cluster.
You can try "--master yarn --deploy-mode client ". you need to pass this
configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client
Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout
On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Actually you might search the archives for “yarn” because I don’t recall
how the setup works off hand.
Archives here:
https://lists.apache.org/list.html?user@predictionio.apache.org
Also check the Spark Yarn requirements and remember that `pio train … --
various Spark params` allows you to pass arbitrary Spark params exactly as
you would to spark-submit on the pio command line. The double dash
separates PIO and Spark params.
From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
<wo...@tomandco.co.uk>
Subject: RE: Problem with training in yarn cluster
What is the command line for `pio train …` Specifically are you using
yarn-cluster mode? This causes the driver code, which is a PIO process, to
be executed on an executor. Special setup is required for this.
From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello,
Actually I have another error in logs that is actually preventing train as
well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *22 May 2018 23:20
*To: *user@predictionio.apache.org
*Subject: *Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services
but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker
machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
RE: Problem with training in yarn cluster
Posted by Wojciech Kowalski <wo...@tomandco.co.uk>.
Hello again,
After moving hbase to dataproc cluster from docker ( probs dns/hostname resolution issues ) no more hbase error but still training stops:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop _live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @10046ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@724eae6a{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@78496d94{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4462fa8{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10e699f8{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a14c082{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:49349}
[INFO] [Server] Started @10430ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@379fcbd1{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[INFO] [DataSource]
╔════════════════════════════════════════════════════════════╗
║ Init DataSource ║
║ ══════════════════════════════════════════════════════════ ║
║ App name shop _live ║
║ Event window None ║
║ Event names List(purchase, basket-add, wishlist-add, view) ║
║ Min events per user None ║
╚════════════════════════════════════════════════════════════╝
[INFO] [URAlgorithm]
╔════════════════════════════════════════════════════════════╗
║ Init URAlgorithm ║
║ ══════════════════════════════════════════════════════════ ║
║ App name shop_live ║
║ ES index name oburindex ║
║ ES type name items ║
║ RecsModel all ║
║ Event names List(purchase, view) ║
║ ══════════════════════════════════════════════════════════ ║
║ Random seed -1931119310 ║
║ MaxCorrelatorsPerEventType 50 ║
║ MaxEventsPerEventType 500 ║
║ BlacklistEvents List(purchase) ║
║ ══════════════════════════════════════════════════════════ ║
║ User bias 1.0 ║
║ Item bias 1.0 ║
║ Max query events 100 ║
║ Limit 20 ║
║ ══════════════════════════════════════════════════════════ ║
║ Rankings: ║
║ popular Some(popRank) ║
╚════════════════════════════════════════════════════════════╝
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@4953588a
[INFO] [Engine$] Preparator: com.actionml.Preparator@715d8f93
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@50c15628)
[INFO] [Engine$] Data sanity check is on.
[WARN] [ApplicationMaster] Reporter thread fails 1 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 2 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 3 time(s) in a row.
[WARN] [ApplicationMaster] Reporter thread fails 4 time(s) in a row.
[INFO] [ServerConnector] Stopped Spark@6a00b5d1{HTTP/1.1}{0.0.0.0:0}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7ef3c37a{/stages/stage/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@4bfd8ec2{/jobs/job/kill,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7a14c082{/api,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@10e699f8{/,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@4462fa8{/static,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@3750c11b{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@65c1fb35{/executors/threadDump,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@26a6525a{/executors/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@78496d94{/executors,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@35acfdf3{/environment/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2f6d8922{/environment,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@27bf31c6{/storage/rdd/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@175b4e7c{/storage/rdd,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@468b9a16{/storage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@46a9ce75{/storage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@6b9b69f8{/stages/pool/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2ea7d76{/stages/pool,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@550be48{/stages/stage/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2271fddb{/stages/stage,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@1a3e64cf{/stages/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@724eae6a{/stages,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@720aa19c{/jobs/job/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@489e0d2e{/jobs/job,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@2679cc20{/jobs/json,null,UNAVAILABLE,@Spark}
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7a6f5572{/jobs,null,UNAVAILABLE,@Spark}
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@e1518c9)
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1527077245287,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))
Also in stderr(?) this:
[Stage 0:> (0 + 0) / 5]
Yarn app info:
User:
pio
Name:
org.apache.predictionio.workflow.CreateWorkflow
Application Type:
SPARK
Application Tags:
Application Priority:
0 (Higher Integer value indicates higher priority)
YarnApplicationState:
FINISHED
Queue:
default
FinalStatus Reported by AM:
FAILED
Started:
Wed May 23 12:06:44 +0000 2018
Elapsed:
40sec
Tracking URL:
History
Log Aggregation Status:
DISABLED
Diagnostics:
Exception was thrown 5 time(s) from Reporter thread.
Unmanaged Application:
false
Application Node Label expression:
<Not set>
AM container Node Label expression:
<DEFAULT_PARTITION>
Thanks,
Wojciech
From: Wojciech Kowalski
Sent: 23 May 2018 11:26
To: Ambuj Sharma; user@predictionio.apache.org
Subject: RE: Problem with training in yarn cluster
Hi,
Ok so full command now is:
pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g --driver-memory 4g --deploy-mode cluster --master yarn
errors stopped after removing –executor-cores 2 --driver-cores 2
I found this error: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=4, maxVirtualCores=2
But now I have problem with hbase :/
I have hbase host set:
declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)
[INFO] [Engine$] Data sanity check is on.
[ERROR] [StorageClient] HBase master is not running (ZooKeeper ensemble: pio-cluster-m). Please make sure that HBase is running properly, and that the configuration is pointing at the correct ZooKeeper ensemble.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)
at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)
at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)
at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)
at com.actionml.DataSource.readTraining(DataSource.scala:76)
at com.actionml.DataSource.readTraining(DataSource.scala:48)
at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)
... 36 more
Caused by: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
... 41 more
From: Ambuj Sharma
Sent: 23 May 2018 08:59
To: user@predictionio.apache.org
Cc: Wojciech Kowalski
Subject: Re: Problem with training in yarn cluster
Hi wojciech,
I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client
Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout
On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Actually you might search the archives for “yarn” because I don’t recall how the setup works off hand.
Archives here: https://lists.apache.org/list.html?user@predictionio.apache.org
Also check the Spark Yarn requirements and remember that `pio train … -- various Spark params` allows you to pass arbitrary Spark params exactly as you would to spark-submit on the pio command line. The double dash separates PIO and Spark params.
From: Pat Ferrel <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
Subject: RE: Problem with training in yarn cluster
What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this.
From: Wojciech Kowalski <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello,
Actually I have another error in logs that is actually preventing train as well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
RE: Problem with training in yarn cluster
Posted by Wojciech Kowalski <wo...@tomandco.co.uk>.
Hi,
Ok so full command now is:
pio train --scratch-uri hdfs://pio-cluster-m/pio -- --executor-memory 4g --driver-memory 4g --deploy-mode cluster --master yarn
errors stopped after removing –executor-cores 2 --driver-cores 2
I found this error: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=4, maxVirtualCores=2
But now I have problem with hbase :/
I have hbase host set:
declare -x PIO_STORAGE_SOURCES_HBASE_HOSTS="pio-gc"
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.actionml.DataSource@2fdb4e2e
[INFO] [Engine$] Preparator: com.actionml.Preparator@d257dd4
[INFO] [Engine$] AlgorithmList: List(com.actionml.URAlgorithm@400bbb7)
[INFO] [Engine$] Data sanity check is on.
[ERROR] [StorageClient] HBase master is not running (ZooKeeper ensemble: pio-cluster-m). Please make sure that HBase is running properly, and that the configuration is pointing at the correct ZooKeeper ensemble.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.client.HConnectionManager$HCoolnnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1645)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1671)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1878)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isMasterRunning(HConnectionManager.java:894)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2366)
at org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:364)
at org.apache.predictionio.data.storage.Storage$.getPDataObject(Storage.scala:307)
at org.apache.predictionio.data.storage.Storage$.getPEvents(Storage.scala:454)
at org.apache.predictionio.data.store.PEventStore$.eventsDb$lzycompute(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.eventsDb(PEventStore.scala:37)
at org.apache.predictionio.data.store.PEventStore$.find(PEventStore.scala:73)
at com.actionml.DataSource.readTraining(DataSource.scala:76)
at com.actionml.DataSource.readTraining(DataSource.scala:48)
at org.apache.predictionio.controller.PDataSource.readTrainingBase(PDataSource.scala:40)
at org.apache.predictionio.controller.Engine$.train(Engine.scala:642)
at org.apache.predictionio.controller.Engine.train(Engine.scala:176)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:42561)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(HConnectionManager.java:1682)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1591)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1617)
... 36 more
Caused by: java.net.UnknownHostException: unknown host: hbase-master
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
... 41 more
From: Ambuj Sharma
Sent: 23 May 2018 08:59
To: user@predictionio.apache.org
Cc: Wojciech Kowalski
Subject: Re: Problem with training in yarn cluster
Hi wojciech,
I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client
Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout
On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
Actually you might search the archives for “yarn” because I don’t recall how the setup works off hand.
Archives here: https://lists.apache.org/list.html?user@predictionio.apache.org
Also check the Spark Yarn requirements and remember that `pio train … -- various Spark params` allows you to pass arbitrary Spark params exactly as you would to spark-submit on the pio command line. The double dash separates PIO and Spark params.
From: Pat Ferrel <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
Subject: RE: Problem with training in yarn cluster
What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this.
From: Wojciech Kowalski <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello,
Actually I have another error in logs that is actually preventing train as well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
Re: Problem with training in yarn cluster
Posted by Ambuj Sharma <am...@getamplify.com>.
Hi wojciech,
I also faced many problems while setting yarn with PredictionIO. This may
be the case where yarn is tyring to findout pio.log file on hdfs cluster.
You can try "--master yarn --deploy-mode client ". you need to pass this
configuration with pio train
e.g., pio train -- --master yarn --deploy-mode client
Thanks and Regards
Ambuj Sharma
Sunrise may late, But Morning is sure.....
Team ML
Betaout
On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> Actually you might search the archives for “yarn” because I don’t recall
> how the setup works off hand.
>
> Archives here: https://lists.apache.org/list.html?user@
> predictionio.apache.org
>
> Also check the Spark Yarn requirements and remember that `pio train … --
> various Spark params` allows you to pass arbitrary Spark params exactly as
> you would to spark-submit on the pio command line. The double dash
> separates PIO and Spark params.
>
>
> From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 22, 2018 at 4:07:38 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>, Wojciech Kowalski
> <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
>
> Subject: RE: Problem with training in yarn cluster
>
> What is the command line for `pio train …` Specifically are you using
> yarn-cluster mode? This causes the driver code, which is a PIO process, to
> be executed on an executor. Special setup is required for this.
>
>
> From: Wojciech Kowalski <wo...@tomandco.co.uk>
> <wo...@tomandco.co.uk>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 22, 2018 at 2:28:43 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject: RE: Problem with training in yarn cluster
>
> Hello,
>
>
>
> Actually I have another error in logs that is actually preventing train as
> well:
>
>
>
> [INFO] [RecommendationEngine$]
>
>
>
> _ _ __ __ _
>
> /\ | | (_) | \/ | |
>
> / \ ___| |_ _ ___ _ __ | \ / | |
>
> / /\ \ / __| __| |/ _ \| '_ \| |\/| | |
>
> / ____ \ (__| |_| | (_) | | | | | | | |____
>
> /_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
>
>
>
>
>
>
>
> [INFO] [Engine] Extracting datasource params...
>
> [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
>
> [INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
>
> [INFO] [Engine] Extracting preparator params...
>
> [INFO] [Engine] Preparator params: (,Empty)
>
> [INFO] [Engine] Extracting serving params...
>
> [INFO] [Engine] Serving params: (,Empty)
>
> [INFO] [log] Logging initialized @6774ms
>
> [INFO] [Server] jetty-9.2.z-SNAPSHOT
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
>
> [INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
>
> [INFO] [Server] Started @7040ms
>
> [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
>
> [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
>
> [ERROR] [ApplicationMaster] Uncaught exception:
>
>
>
> Thanks,
>
> Wojciech
>
>
>
> *From: *Wojciech Kowalski <wo...@tomandco.co.uk>
> *Sent: *22 May 2018 23:20
> *To: *user@predictionio.apache.org
> *Subject: *Problem with training in yarn cluster
>
>
>
> Hello, I am trying to setup distributed cluster with separate all services
> but i have problem while running train:
>
>
>
> log4j:ERROR setFile(null,true) call failed.
>
> java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
>
> at java.io.FileOutputStream.open0(Native Method)
>
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
>
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>
> at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
>
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
>
> at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
>
> at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
>
> at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
>
> at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
>
> at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
>
> at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
>
> at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
>
> at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
>
> at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
>
> at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>
> at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>
> at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
>
> at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
>
> at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
>
> at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
>
> at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
>
> at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
>
> at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>
>
>
>
>
> setup:
>
> hbase
>
> Hadoop
>
> Hdfs
>
> Spark cluster with yarn
>
>
>
> Training in cluster mode
>
> I assume spark worker is trying to save log to /pio/pio.log on worker
> machine instead of pio host. How can I set pio destination to hdfs path ?
>
>
>
> Or any other advice ?
>
>
>
> Thanks,
>
> Wojciech
>
>
>
>
RE: Problem with training in yarn cluster
Posted by Pat Ferrel <pa...@occamsmachete.com>.
Actually you might search the archives for “yarn” because I don’t recall
how the setup works off hand.
Archives here:
https://lists.apache.org/list.html?user@predictionio.apache.org
Also check the Spark Yarn requirements and remember that `pio train … --
various Spark params` allows you to pass arbitrary Spark params exactly as
you would to spark-submit on the pio command line. The double dash
separates PIO and Spark params.
From: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 4:07:38 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>, Wojciech Kowalski <wo...@tomandco.co.uk>
<wo...@tomandco.co.uk>
Subject: RE: Problem with training in yarn cluster
What is the command line for `pio train …` Specifically are you using
yarn-cluster mode? This causes the driver code, which is a PIO process, to
be executed on an executor. Special setup is required for this.
From: Wojciech Kowalski <wo...@tomandco.co.uk> <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello,
Actually I have another error in logs that is actually preventing train as
well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params:
(,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add,
view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started
o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to
request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
*From: *Wojciech Kowalski <wo...@tomandco.co.uk>
*Sent: *22 May 2018 23:20
*To: *user@predictionio.apache.org
*Subject: *Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services
but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker
machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech
RE: Problem with training in yarn cluster
Posted by Pat Ferrel <pa...@occamsmachete.com>.
What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this.
From: Wojciech Kowalski <wo...@tomandco.co.uk>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
Date: May 22, 2018 at 2:28:43 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
Subject: RE: Problem with training in yarn cluster
Hello,
Actually I have another error in logs that is actually preventing train as well:
[INFO] [RecommendationEngine$]
_ _ __ __ _
/\ | | (_) | \/ | |
/ \ ___| |_ _ ___ _ __ | \ / | |
/ /\ \ / __| __| |/ _ \| '_ \| |\/| | |
/ ____ \ (__| |_| | (_) | | | | | | | |____
/_/ \_\___|\__|_|\___/|_| |_|_| |_|______|
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, view),None,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6774ms
[INFO] [Server] jetty-9.2.z-SNAPSHOT
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948}
[INFO] [Server] Started @7040ms
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[ERROR] [ApplicationMaster] Uncaught exception:
Thanks,
Wojciech
From: Wojciech Kowalski
Sent: 22 May 2018 23:20
To: user@predictionio.apache.org
Subject: Problem with training in yarn cluster
Hello, I am trying to setup distributed cluster with separate all services but i have problem while running train:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /pio/pio.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
setup:
hbase
Hadoop
Hdfs
Spark cluster with yarn
Training in cluster mode
I assume spark worker is trying to save log to /pio/pio.log on worker machine instead of pio host. How can I set pio destination to hdfs path ?
Or any other advice ?
Thanks,
Wojciech