You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Chao Long <wa...@qq.com> on 2018/12/12 02:23:42 UTC
回复:RE: 回复:RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
Step 3 OOM may cause by UHC (ultra high cardinality) column. Is there some UHC columns with globalDictionary(bitmap measure,topN measure)? If yes, you can try to set parameter "kylin.engine.mr.uhc-reducer-count" a larger value(default 1) and set "kylin.engine.mr.build-uhc-dict-in-additional-step"=true.
------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
发送时间: 2018年12月12日(星期三) 凌晨4:21
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;
主题: RE: 回复:RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Chao:
My cube build with spark failed on step 3 problem was caused by run out of memory.
In kylin.log, I can find out the application Id. Then used the following command to find the yarn log:
yarn logs -applicationId application_1544204485929_0069 >tmp11
In the log, I saw something like:
2018-12-10 19:20:42,807 WARN netty.NettyRpcEnv: Ignored failure: java.util.concurrent.TimeoutException: Cannot receive any reply in 10 seconds
2018-12-10 19:20:42,959 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
2018-12-10 19:20:42,959 ERROR util.Utils: Uncaught exception in thread driver-heartbeater
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.StringBuffer.toString(StringBuffer.java:671)
or:
2018-12-11 19:15:31,295 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
2018-12-11 19:15:31,293 WARN nio.NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.valueOf(Integer.java:832)
at sun.nio.ch.EPollSelectorImpl.updateSelectedKeys(EPollSelectorImpl.java:106)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:84)
I changed the kylin.properties:
kylin.engine.spark-conf.spark.executor.memory=4G
to:
kylin.engine.spark-conf.spark.executor.memory=40G
The cube build finished OK.
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Monday, December 10, 2018 3:13 PM
To: user@kylin.apache.org
Subject: RE: 回复:RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Chao:
Did you set “kylin.source.hive.flat-table-storage-format” as SEQUENCEFILE or other value? If I set it to SEQUENCEFILE, the cube build will be OK. I will try your suggestion and see if it works.
I am seeing another problem for spark cube build. At step 3, I saw some executor failed. I am wondering how to find out the root cause. Here is the log from stderr:
018-12-10 18:44:39,776 INFO scheduler.TaskSetManager: Finished task 28.0 in stage 0.0 (TID 21) in 46644 ms on hadoop3 (executor 34) (28/31)
2018-12-10 18:44:39,783 INFO yarn.YarnAllocator: Driver requested a total number of 3 executor(s).
2018-12-10 18:46:02,679 INFO scheduler.TaskSetManager: Finished task 21.0 in stage 0.0 (TID 19) in 129550 ms on hadoop5 (executor 18) (29/31)
2018-12-10 18:46:02,778 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
2018-12-10 18:53:55,116 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 21.
2018-12-10 18:53:55,125 INFO scheduler.DAGScheduler: Executor lost: 21 (epoch 0)
2018-12-10 18:53:55,126 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
2018-12-10 18:53:55,128 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(21, hadoop2, 40455, None)
2018-12-10 18:53:55,129 INFO storage.BlockManagerMaster: Removed 21 successfully in removeExecutor
2018-12-10 18:53:55,330 INFO yarn.YarnAllocator: Completed container container_1544204485929_0069_01_000022 on host: hadoop2 (state: COMPLETE, exit status: 143)
2018-12-10 18:53:55,333 WARN yarn.YarnAllocator: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
2018-12-10 18:53:55,338 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
2018-12-10 18:53:55,342 ERROR cluster.YarnClusterScheduler: Lost executor 21 on hadoop2: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
2018-12-10 18:53:55,348 WARN scheduler.TaskSetManager: Lost task 29.0 in stage 0.0 (TID 26, hadoop2, executor 21): ExecutorLostFailure (executor 21 exited caused by one of the running tasks) Reason: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
2018-12-10 18:53:55,351 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
2018-12-10 18:53:55,351 INFO storage.BlockManagerMaster: Removal of executor 21 requested
2018-12-10 18:53:55,352 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 21
2018-12-10 18:53:55,360 INFO spark.ExecutorAllocationManager: Existing executor 21 has been removed (new total is 39)
Thanks.
Kang-sen
From: Chao Long <wa...@qq.com>
Sent: Monday, December 10, 2018 11:15 AM
To: user <us...@kylin.apache.org>
Subject: 回复:RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
In my environment hdp-2.4.0.0-169, hive-1.2.1000.2.4.0.0-169), copy /usr/hdp/2.4.0.0-169/hive/conf/hive-site.xml to $KYLIN_HOME/spark/conf can fix the problem "table not found in database". So, I think this could be an environment issue.
You can try to run the spark execution cmd(add --file /xx/xx/hive-site.xml) manually in the cli and see if get the same error.
------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
发送时间: 2018年12月10日(星期一) 晚上8:58
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;
主题: RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Chao: (I hope I got your first name correctly.)
Thanks for the reply. I have recognized that KYLIN-3699 was opened to address this problem.
I believe there is no bug opened to address the problem that only SEQUENCEFILE is supported for spark cube build. Right?
Kang-sen
From: Chao Long <wa...@qq.com>
Sent: Sunday, December 09, 2018 11:50 AM
To: user <us...@kylin.apache.org>
Subject: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi KangSen,
There is a known jira issue about Spark cubing failed at step7 with no input data.
https://issues.apache.org/jira/browse/KYLIN-3699
------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
发送时间: 2018年12月8日(星期六) 凌晨5:32
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;
主题: RE: anybody used spark to build cube in kylin 2.5.1?
I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.
I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.
In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Friday, December 07, 2018 11:35 AM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
The spark cube build does not have correct support for non-SEQUENCEFILE.
In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds
The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”. Here is my cmd: 2018-12-06 11:50:02,665 INFO [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.executor.memory=4G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata And this is the cmd on kylin doc:
2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
My question is “what config parameter can cause this difference”? Thanks. Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0}
From: ShaoFeng Shi <sh...@apache.org>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen