You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Chao Long <wa...@qq.com> on 2018/12/12 02:23:42 UTC
回复：RE: 回复：RE: 回复：RE: anybody used spark to build cube in kylin 2.5.1?

Hi Kang-Sen,
   Step 3 OOM may cause by UHC (ultra high cardinality) column. Is there some UHC columns with globalDictionary(bitmap measure,topN measure)? If yes, you can try to set parameter "kylin.engine.mr.uhc-reducer-count" a larger value(default 1) and set "kylin.engine.mr.build-uhc-dict-in-additional-step"=true.



------------------
Best Regards,
Chao Long


 
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
发送时间: 2018年12月12日(星期三) 凌晨4:21
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;

主题: RE: 回复：RE: 回复：RE: anybody used spark to build cube in kylin 2.5.1?



  
Hi, Chao:
 
 
 
My cube build with spark failed on step 3 problem was caused by run out of memory.
 
 
 
In kylin.log, I can find out the application Id. Then used the following command to find the yarn log:
 
 
 
yarn logs -applicationId application_1544204485929_0069 >tmp11
 
 
 
In the log, I saw something like:
 
 
 
2018-12-10 19:20:42,807 WARN netty.NettyRpcEnv: Ignored failure: java.util.concurrent.TimeoutException: Cannot receive any reply in 10 seconds
 
2018-12-10 19:20:42,959 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
 
2018-12-10 19:20:42,959 ERROR util.Utils: Uncaught exception in thread driver-heartbeater
 
java.lang.OutOfMemoryError: GC overhead limit exceeded
 
       at java.util.Arrays.copyOfRange(Arrays.java:3664)
 
        at java.lang.StringBuffer.toString(StringBuffer.java:671)
 
 
 
or:
 
 
 
2018-12-11 19:15:31,295 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
 
2018-12-11 19:15:31,293 WARN nio.NioEventLoop: Unexpected exception in the selector loop.
 
java.lang.OutOfMemoryError: Java heap space
 
        at java.lang.Integer.valueOf(Integer.java:832)
 
        at sun.nio.ch.EPollSelectorImpl.updateSelectedKeys(EPollSelectorImpl.java:106)
 
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:84)
 
 
 
I changed the kylin.properties:
 
 
 
kylin.engine.spark-conf.spark.executor.memory=4G
 
 
 
to:
 
 
 
kylin.engine.spark-conf.spark.executor.memory=40G
 
 
 
The cube build finished OK.
 
 
 
Thanks.
 
 
 
Kang-sen
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Monday, December 10, 2018 3:13 PM
 To: user@kylin.apache.org
 Subject: RE: 回复：RE:  回复：RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Chao:
 
 
 
Did you set “kylin.source.hive.flat-table-storage-format” as SEQUENCEFILE or other value? If I set it to SEQUENCEFILE, the cube build will be OK. I will try your suggestion and see if it works.
 
 
 
I am seeing another problem for spark cube build. At step 3, I saw some executor failed. I am wondering how to find out the root cause. Here is the log from stderr:
 
 
 
018-12-10 18:44:39,776 INFO scheduler.TaskSetManager: Finished task 28.0 in stage 0.0 (TID 21) in 46644 ms on hadoop3 (executor 34) (28/31)
 
2018-12-10 18:44:39,783 INFO yarn.YarnAllocator: Driver requested a total number of 3 executor(s).
 
2018-12-10 18:46:02,679 INFO scheduler.TaskSetManager: Finished task 21.0 in stage 0.0 (TID 19) in 129550 ms on hadoop5 (executor 18) (29/31)
 
2018-12-10 18:46:02,778 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
 
2018-12-10 18:53:55,116 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 21.
 
2018-12-10 18:53:55,125 INFO scheduler.DAGScheduler: Executor lost: 21 (epoch 0)
 
2018-12-10 18:53:55,126 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
 
2018-12-10 18:53:55,128 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(21, hadoop2, 40455, None)
 
2018-12-10 18:53:55,129 INFO storage.BlockManagerMaster: Removed 21 successfully in removeExecutor
 
2018-12-10 18:53:55,330 INFO yarn.YarnAllocator: Completed container container_1544204485929_0069_01_000022 on host: hadoop2 (state: COMPLETE, exit status: 143)
 
2018-12-10 18:53:55,333 WARN yarn.YarnAllocator: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed  on request. Exit code is 143
 
Container exited with a non-zero exit code 143
 
Killed by external signal
 
 
 
2018-12-10 18:53:55,338 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status:  143. Diagnostics: Container killed on request. Exit code is 143
 
Container exited with a non-zero exit code 143
 
Killed by external signal
 
 
 
2018-12-10 18:53:55,342 ERROR cluster.YarnClusterScheduler: Lost executor 21 on hadoop2: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit  status: 143. Diagnostics: Container killed on request. Exit code is 143
 
Container exited with a non-zero exit code 143
 
Killed by external signal
 
 
 
2018-12-10 18:53:55,348 WARN scheduler.TaskSetManager: Lost task 29.0 in stage 0.0 (TID 26, hadoop2, executor 21): ExecutorLostFailure (executor 21 exited caused by one of the running  tasks) Reason: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
 
Container exited with a non-zero exit code 143
 
Killed by external signal
 
 
 
2018-12-10 18:53:55,351 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
 
2018-12-10 18:53:55,351 INFO storage.BlockManagerMaster: Removal of executor 21 requested
 
2018-12-10 18:53:55,352 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 21
 
2018-12-10 18:53:55,360 INFO spark.ExecutorAllocationManager: Existing executor 21 has been removed (new total is 39)
 
 
 
Thanks.
 
 
 
Kang-sen
 
 
 
From: Chao Long <wa...@qq.com> 
 Sent: Monday, December 10, 2018 11:15 AM
 To: user <us...@kylin.apache.org>
 Subject: 回复：RE: 回复：RE: anybody used spark to build cube in kylin 2.5.1?
 
 
  
Hi Kang-Sen,
 
  
    In my environment hdp-2.4.0.0-169, hive-1.2.1000.2.4.0.0-169), copy /usr/hdp/2.4.0.0-169/hive/conf/hive-site.xml to $KYLIN_HOME/spark/conf can fix the problem "table not found in database". So, I think this could be an environment issue.
 
  
 
 
  
   You can try to run the spark execution cmd(add --file /xx/xx/hive-site.xml) manually in the cli and see if get the same error.
 
  
 
 
   
------------------
 
   
Best Regards,
 
  
Chao Long
 
 
 
  
 
 
   
------------------ 原始邮件 ------------------
 
   
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
 
  
发送时间: 2018年12月10日(星期一) 晚上8:58
 
  
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;
 
  
主题: RE: 回复：RE: anybody used spark to build cube in kylin 2.5.1?
 
 
  
 
 
  
Hi, Chao: (I hope I got your first name correctly.)
 
 
 
Thanks for the reply. I have recognized that KYLIN-3699 was opened to address this problem.
 
 
 
I believe there is no bug opened to address the problem that only SEQUENCEFILE is supported for spark cube build. Right?
 
 
 
Kang-sen
 
 
 
From: Chao Long <wa...@qq.com> 
 Sent: Sunday, December 09, 2018 11:50 AM
 To: user <us...@kylin.apache.org>
 Subject: 回复：RE: anybody used spark to build cube in kylin 2.5.1?
 
 
  
Hi KangSen,
 
  
   There is a known jira issue about Spark cubing failed at step7 with no input data.
 
  
   https://issues.apache.org/jira/browse/KYLIN-3699
 
  
 
 
   
------------------
 
   
Best Regards,
 
  
Chao Long
 
 
 
   
------------------ 原始邮件 ------------------
 
   
 发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
 
  
 发送时间: 2018年12月8日(星期六) 凌晨5:32
 
  
 收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;
 
  
 主题: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
  
 
 
  
I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.
 
 
 
I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.
 
 
 
In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.
 
 
 
Thanks.
 
 
 
Kang-sen
 
 
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Friday, December 07, 2018 11:35 AM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
The spark cube build does not have correct support for non-SEQUENCEFILE.
 
 
 
In my kylin.properties, I changed from:
 
kylin.source.hive.flat-table-storage-format=TEXTFILE
 
to:
 
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
 
 
 
Then restarted kylin.
 
The spark cube build passed step3 and failed at step 7:
 
#7 Step Name: Build Cube with Spark
 Duration: 1.45 mins  Waiting: 0 seconds
 
 
 
The error is the same as reported by KYLIN-3699.
 
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
 
 
 
Thanks.
 
 
 
Kang-sen
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Thursday, December 06, 2018 2:11 PM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Shaofeng:
 
 
 
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
 
 
 “--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.   Here is my cmd:   2018-12-06 11:50:02,665 INFO  [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.cores=1  --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec  --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.master=yarn  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true  --conf spark.executor.instances=40  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.executor.memory=4G  --conf spark.yarn.queue=default  --conf spark.submit.deployMode=cluster  --conf spark.dynamicAllocation.minExecutors=1  --conf spark.network.timeout=600  --conf spark.hadoop.dfs.replication=2  --conf spark.yarn.executor.memoryOverhead=1024  --conf spark.dynamicAllocation.executorIdleTimeout=300  --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history  --conf spark.driver.memory=2G  --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec  --conf spark.eventLog.enabled=true  --conf spark.shuffle.service.enabled=true  --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog  --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar  --conf spark.dynamicAllocation.maxExecutors=1000  --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata   And this is the cmd on kylin doc:   
2017-03-06 14:44:38,574 INFO  [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit  --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=1  --conf spark.yarn.queue=default  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.driver.extraJavaOptions=-Dhdp.version=current  --conf spark.master=yarn  --conf spark.executor.extraJavaOptions=-Dhdp.version=current  --conf spark.executor.memory=1G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.executor.cores=2  --conf spark.submit.deployMode=cluster  --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e  -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
   My question is “what config parameter can cause this difference”?   Thanks.   Kang-sen   
 
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Wednesday, December 05, 2018 4:59 PM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Shaofeng:
 
 
 
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
 
 
 
<property>
 
  <name>javax.jdo.option.ConnectionURL</name>
 
  <value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
<property>
 
  <name>javax.jdo.option.ConnectionDriverName</name>
 
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
 
  <description>Driver class name for a JDBC metastore</description>
 
</property>
 
 
 
<property>
 
  <name>hive.hwi.war.file</name>
 
  <value>/usr/lib/hive/lib/hive-hwi-.war</value>
 
  <description>This is the WAR file with the jsp content for Hive Web Interface</description>
 
</property>
 
 
 
<property>
 
  <name>hive.metastore.uris</name>
 
  <value>thrift://anovadata6.anovadata.local:9083</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
<property>
 
  <name>hive.metastore.warehouse.dir</name>
 
  <value>/apps/hive/warehouse</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
 
 
 
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
 
 
 
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
 
 
 
Kang-sen
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Wednesday, December 05, 2018 9:32 AM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Shaofeng:
 
 
 
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
 
 
 
I did search internet about spark and hive integration, but I failed to find out a concrete example.
 
 
 
Anyway, I updated my kylin/spark/conf/hive-site.xml, 
 
 
 
<property>
 
  <name>javax.jdo.option.ConnectionURL</name>
 
  <value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
And restarted kylin. But I still get the following erroe:
 
 
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
 
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
 
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
 
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
 
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
 
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
 
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
 
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
 
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
 
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases 
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=* 
 
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
 
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
 
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
 
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
 
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
 
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default    
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp 
 
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
 
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef 
 
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause:  Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
 
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'  not found in database 'zetticsdw';
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 
 
 
My question is why spark is not able to find the hive metastore location?
 
 
 
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
 
 
 
Kang-sen
 
 
 
From: ShaoFeng Shi <sh...@apache.org> 
 Sent: Monday, December 03, 2018 7:53 PM
 To: user <us...@kylin.apache.org>
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
   
Just double check it; The error message is clear, and do some search with Spark + Hive.
 
  
 
 
  
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
 
 

 
        
Best regards,
  
 
 
  
Shaofeng Shi 史少锋
 
  
Apache Kylin PMC
 
  
Work email: shaofeng.shi@kyligence.io
 
 
  
Kyligence Inc: https://kyligence.io/
 
  
 
 
  
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
 
  
Join Kylin user mail group: user-subscribe@kylin.apache.org
 
  
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月3日周一 下午9:33写道：
 
    
Hi, Shaofeng:
 
 
 
Thanks for the reply.
 
 
 
This is a line in my kylin.properties:
 
 
 
kylin.source.hive.flat-table-storage-format=TEXTFILE
 
 
 
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
 
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
 
 
 
The cube-build still failed, the stderr log is as follows:
 
 
 
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
 
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
 
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases 
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
 
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=* 
 
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
 
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
 
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
 
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
 
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
 
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
 
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
 
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default    
 
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
 
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp 
 
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
 
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef 
 
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause:  Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
 
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'  not found in database 'zetticsdw';
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
        at java.lang.reflect.Method.invoke(Method.java:606)
 
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
 
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found  in database 'zetticsdw';
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at scala.Option.getOrElse(Option.scala:121)
 
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at  org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
 
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
 
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
 
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
 
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
 
        ... 6 more
 
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct.  Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
 
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
 
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0}
 
 
 
 
 
 
 
From: ShaoFeng Shi <sh...@apache.org> 
 Sent: Sunday, December 02, 2018 2:04 AM
 To: user <us...@kylin.apache.org>
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
  
Hi Kang-sen,
  
 
 
  
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf  folder. Please confirm whether it is this case, if true, put the file and then try again.
 
  

 
        
Best regards,
  
 
 
  
Shaofeng Shi 史少锋
 
  
Apache Kylin PMC
 
  
Work email: shaofeng.shi@kyligence.io
 
 
  
Kyligence Inc: https://kyligence.io/
 
  
 
 
  
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
 
  
Join Kylin user mail group: user-subscribe@kylin.apache.org
 
  
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月1日周六 上午12:30写道：
 
    
Hi, SHaofeng:
 
 
 
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
 
 
 
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
 
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
 
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
 
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
 
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
 
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
 
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
 
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
 
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases 
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=* 
 
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
 
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
 
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
 
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
 
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
 
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default    
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp 
 
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
 
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
 
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
 
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69 
 
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause:  Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
 
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'  not found in database 'zetticsdw';
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
        at java.lang.reflect.Method.invoke(Method.java:606)
 
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
 
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found  in database 'zetticsdw';
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at scala.Option.getOrElse(Option.scala:121)
 
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at  org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
 
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
 
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
 
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
 
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
 
        ... 6 more
 
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct.  Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
 
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
 
 
 
Kang-sen
 
 
 
From: ShaoFeng Shi <sh...@apache.org> 
 Sent: Friday, November 30, 2018 8:53 AM
 To: user <us...@kylin.apache.org>
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
   
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
  
 
 
   
cat /usr/local/spark/conf/java-opts
 
  
-Dhdp.version=2.4.0.0-169
 
  
 
 
  
 
 
        
Best regards,
  
 
 
  
Shaofeng Shi 史少锋
 
  
Apache Kylin PMC
 
  
Work email: shaofeng.shi@kyligence.io
 
 
  
Kyligence Inc: https://kyligence.io/
 
  
 
 
  
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
 
  
Join Kylin user mail group: user-subscribe@kylin.apache.org
 
  
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月30日周五 下午9:04写道：
 
    
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
 
 
 
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar
 
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 
#
 
## uncomment for HDP
 
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
 
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
 
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
 
 
 
But I still get the same error.
 
 
 
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh:  line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:  bad substitution
 
 
 
                at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
 
                at org.apache.hadoop.util.Shell.run(Shell.java:848)
 
                at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
 
                at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
 
 
 
I also saw in stderr:
 
 
 
Log Type: stderr 
 
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018 
 
Log Length: 88 
 
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
 
 
 
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40”  was not enough.
 
 
 
Kang-sen
 
 
 
 
 
 
 
 
 
 
 
From: Yichen Zhou <zh...@gmail.com> 
 Sent: Thursday, November 29, 2018 9:08 PM
 To: user@kylin.apache.org
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
   
Hi Kang-Sen,
  
 
 
  
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
 
   ## uncomment for HDP #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current 
 
  
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
 
  
 
 
  
Regards,
 
  
Yichen
 
  
 
 
  
 
 
   
JiaTao Tao <ta...@gmail.com> 于2018年11月30日周五 上午9:57写道：
 
       
Hi
 
  
 
 
  
I took a look at the Internet and found these links, take a try and hope it helps.
 
  
 
 
  
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
 
  
 
 
  
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
 
  
 
 
  
-- 
      
 
 
Regards!
 
Aron Tao
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月29日周四 下午3:11写道：
 
    
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
 
 
 
I selected spark in the cube design, advanced setting.
 
 
 
The cube build failed at step 3, with the following error log:
 
 
 
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050
 
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
 
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
 
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
 
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
 
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
 
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
 
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
 
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
 
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
 
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed  cache.
 
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
 
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to: 
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to: 
 
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zettics); groups with view permissions:  Set(); users  with modify permissions: Set(zettics); groups with modify permissions: Set()
 
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
 
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
 
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
 
18/11/29 09:50:40 INFO yarn.Client: 
 
         client token: N/A
 
        diagnostics: AM container is launched, waiting for AM container to Register with RM
 
        ApplicationMaster host: N/A
 
        ApplicationMaster RPC port: -1
 
        queue: default
 
        start time: 1543503039903
 
        final status: UNDEFINED
 
        tracking URL:  http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
 
        user: zettics
 
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
 
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
 
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
 
18/11/29 09:50:43 INFO yarn.Client: 
 
         client token: N/A
 
        diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with  exitCode: 1
 
For more detailed output, check the application tracking page:  http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
 
Diagnostics: Exception from container-launch.
 
Container id: container_e05_1543422353836_0088_02_000001
 
Exit code: 1
 
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:  bad substitution
 
 
 
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:  line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:  bad substitution
 
 
 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
 
        at org.apache.hadoop.util.Shell.run(Shell.java:848)
 
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
 
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
 
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
 
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
 
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
        at java.lang.Thread.run(Thread.java:745)
 
 
 
 
 
Thanks.
 
 
 
Kang-sen