You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by Chao Long <wa...@qq.com> on 2018/12/09 16:49:55 UTC

回复：RE: anybody used spark to build cube in kylin 2.5.1?

Hi KangSen,
   There is a known jira issue about Spark cubing failed at step7 with no input data.
   https://issues.apache.org/jira/browse/KYLIN-3699


------------------
Best Regards,
Chao Long


------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
发送时间: 2018年12月8日(星期六) 凌晨5:32
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;

主题: RE: anybody used spark to build cube in kylin 2.5.1?



  
I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.
 
 
 
I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.
 
 
 
In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.
 
 
 
Thanks.
 
 
 
Kang-sen
 
 
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Friday, December 07, 2018 11:35 AM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
The spark cube build does not have correct support for non-SEQUENCEFILE.
 
 
 
In my kylin.properties, I changed from:
 
kylin.source.hive.flat-table-storage-format=TEXTFILE
 
to:
 
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
 
 
 
Then restarted kylin.
 
The spark cube build passed step3 and failed at step 7:
 
#7 Step Name: Build Cube with Spark
 Duration: 1.45 mins  Waiting: 0 seconds
 
 
 
The error is the same as reported by KYLIN-3699.
 
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
 
 
 
Thanks.
 
 
 
Kang-sen
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Thursday, December 06, 2018 2:11 PM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Shaofeng:
 
 
 
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
 
 
 “--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.   Here is my cmd:   2018-12-06 11:50:02,665 INFO  [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.cores=1  --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec  --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.master=yarn  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true  --conf spark.executor.instances=40  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.executor.memory=4G  --conf spark.yarn.queue=default  --conf spark.submit.deployMode=cluster  --conf spark.dynamicAllocation.minExecutors=1  --conf spark.network.timeout=600  --conf spark.hadoop.dfs.replication=2  --conf spark.yarn.executor.memoryOverhead=1024  --conf spark.dynamicAllocation.executorIdleTimeout=300  --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history  --conf spark.driver.memory=2G  --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec  --conf spark.eventLog.enabled=true  --conf spark.shuffle.service.enabled=true  --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog  --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar  --conf spark.dynamicAllocation.maxExecutors=1000  --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata   And this is the cmd on kylin doc:   
2017-03-06 14:44:38,574 INFO  [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit  --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=1  --conf spark.yarn.queue=default  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.driver.extraJavaOptions=-Dhdp.version=current   --conf spark.master=yarn  --conf spark.executor.extraJavaOptions=-Dhdp.version=current  --conf spark.executor.memory=1G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.executor.cores=2  --conf spark.submit.deployMode=cluster  --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e  -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
   My question is “what config parameter can cause this difference”?   Thanks.   Kang-sen   
 
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Wednesday, December 05, 2018 4:59 PM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Shaofeng:
 
 
 
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
 
 
 
<property>
 
  <name>javax.jdo.option.ConnectionURL</name>
 
  <value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
<property>
 
  <name>javax.jdo.option.ConnectionDriverName</name>
 
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
 
  <description>Driver class name for a JDBC metastore</description>
 
</property>
 
 
 
<property>
 
  <name>hive.hwi.war.file</name>
 
  <value>/usr/lib/hive/lib/hive-hwi-.war</value>
 
  <description>This is the WAR file with the jsp content for Hive Web Interface</description>
 
</property>
 
 
 
<property>
 
  <name>hive.metastore.uris</name>
 
  <value>thrift://anovadata6.anovadata.local:9083</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
<property>
 
  <name>hive.metastore.warehouse.dir</name>
 
  <value>/apps/hive/warehouse</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
 
 
 
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
 
 
 
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
 
 
 
Kang-sen
 
 
   
From: Kang-Sen Lu <kl...@anovadata.com> 
 Sent: Wednesday, December 05, 2018 9:32 AM
 To: user@kylin.apache.org
 Subject: RE: anybody used spark to build cube in kylin 2.5.1?
 
 
 
 
 
Hi, Shaofeng:
 
 
 
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
 
 
 
I did search internet about spark and hive integration, but I failed to find out a concrete example.
 
 
 
Anyway, I updated my kylin/spark/conf/hive-site.xml, 
 
 
 
<property>
 
  <name>javax.jdo.option.ConnectionURL</name>
 
  <value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
 
  <description>JDBC connect string for a JDBC metastore</description>
 
</property>
 
 
 
And restarted kylin. But I still get the following erroe:
 
 
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
 
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
 
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
 
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}
 
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
 
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
 
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
 
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
 
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
 
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
 
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
 
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
 
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases  
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*   
 
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
 
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
 
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
 
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
 
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
 
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
 
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default      
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp  
 
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
 
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef  
 
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause:  Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
 
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'  not found in database 'zetticsdw';
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 
 
 
My question is why spark is not able to find the hive metastore location?
 
 
 
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
 
 
 
Kang-sen
 
 
 
From: ShaoFeng Shi <sh...@apache.org> 
 Sent: Monday, December 03, 2018 7:53 PM
 To: user <us...@kylin.apache.org>
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
   
Just double check it; The error message is clear, and do some search with Spark + Hive.
 
  
 
 
  
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
 
 

 
        
Best regards,
  
 
 
  
Shaofeng Shi 史少锋
 
  
Apache Kylin PMC
 
  
Work email:  shaofeng.shi@kyligence.io
 
 
  
Kyligence Inc: https://kyligence.io/
 
  
 
 
  
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
 
  
Join Kylin user mail group:  user-subscribe@kylin.apache.org
 
  
Join Kylin dev mail group:  dev-subscribe@kylin.apache.org
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月3日周一 下午9:33写道：
 
    
Hi, Shaofeng:
 
 
 
Thanks for the reply.
 
 
 
This is a line in my kylin.properties:
 
 
 
kylin.source.hive.flat-table-storage-format=TEXTFILE
 
 
 
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
 
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
 
 
 
The cube-build still failed, the stderr log is as follows:
 
 
 
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
 
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification  is not enabled so recording the schema version 1.2.0
 
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
 
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases  
 
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
 
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*   
 
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged  as "embedded-only" so does not have its own datastore table.
 
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
 
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
 
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
 
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
 
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
 
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
 
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
 
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default      
 
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
 
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp  
 
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
 
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
 
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef  
 
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct.  Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
 
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'  not found in database 'zetticsdw';
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
        at java.lang.reflect.Method.invoke(Method.java:606)
 
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
 
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'  not found in database 'zetticsdw';
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at scala.Option.getOrElse(Option.scala:121)
 
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at  org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
 
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
 
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
 
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
 
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
 
        ... 6 more
 
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception:  java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
 
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
 
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0}
 
 
 
 
 
 
 
From: ShaoFeng Shi <sh...@apache.org> 
 Sent: Sunday, December 02, 2018 2:04 AM
 To: user <us...@kylin.apache.org>
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
  
Hi Kang-sen,
  
 
 
  
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf  folder. Please confirm whether it is this case, if true, put the file and then try again.
 
  

 
        
Best regards,
  
 
 
  
Shaofeng Shi 史少锋
 
  
Apache Kylin PMC
 
  
Work email: shaofeng.shi@kyligence.io
 
 
  
Kyligence Inc: https://kyligence.io/
 
  
 
 
  
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
 
  
Join Kylin user mail group: user-subscribe@kylin.apache.org
 
  
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月1日周六 上午12:30写道：
 
    
Hi, SHaofeng:
 
 
 
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
 
 
 
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
 
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
 
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO handler.ContextHandler: Started  o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}
 
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
 
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
 
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
 
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
 
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged  as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"  so does not have its own datastore table.
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged  as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only"  so does not have its own datastore table.
 
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
 
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
 
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification  is not enabled so recording the schema version 1.2.0
 
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases  
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*   
 
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged  as "embedded-only" so does not have its own datastore table.
 
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
 
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
 
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
 
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
 
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
 
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default      
 
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
 
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp  
 
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
 
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
 
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
 
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69  
 
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct.  Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
 
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'  not found in database 'zetticsdw';
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
        at java.lang.reflect.Method.invoke(Method.java:606)
 
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
 
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'  not found in database 'zetticsdw';
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
 
        at scala.Option.getOrElse(Option.scala:121)
 
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
 
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at  org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
 
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
 
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
 
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
 
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
 
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
 
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
 
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
 
        ... 6 more
 
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception:  java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
 
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
 
 
 
Kang-sen
 
 
 
From: ShaoFeng Shi <sh...@apache.org> 
 Sent: Friday, November 30, 2018 8:53 AM
 To: user <us...@kylin.apache.org>
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
   
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
  
 
 
   
cat /usr/local/spark/conf/java-opts
 
  
-Dhdp.version=2.4.0.0-169
 
  
 
 
  
 
 
        
Best regards,
  
 
 
  
Shaofeng Shi 史少锋
 
  
Apache Kylin PMC
 
  
Work email: shaofeng.shi@kyligence.io
 
 
  
Kyligence Inc: https://kyligence.io/
 
  
 
 
  
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
 
  
Join Kylin user mail group: user-subscribe@kylin.apache.org
 
  
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月30日周五 下午9:04写道：
 
    
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
 
 
 
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar
 
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 
#
 
## uncomment for HDP
 
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
 
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
 
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
 
 
 
But I still get the same error.
 
 
 
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh:  line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:  bad substitution
 
 
 
                at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
 
                at org.apache.hadoop.util.Shell.run(Shell.java:848)
 
                at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
 
                at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
 
 
 
I also saw in stderr:
 
 
 
Log Type: stderr 
 
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018 
 
Log Length: 88 
 
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
 
 
 
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40”  was not enough.
 
 
 
Kang-sen
 
 
 
 
 
 
 
 
 
 
 
From: Yichen Zhou <zh...@gmail.com> 
 Sent: Thursday, November 29, 2018 9:08 PM
 To: user@kylin.apache.org
 Subject: Re: anybody used spark to build cube in kylin 2.5.1?
 
 
   
Hi Kang-Sen,
  
 
 
  
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
 
   ## uncomment for HDP #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current 
 
  
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
 
  
 
 
  
Regards,
 
  
Yichen
 
  
 
 
  
 
 
   
JiaTao Tao <ta...@gmail.com> 于2018年11月30日周五 上午9:57写道：
 
       
Hi
 
  
 
 
  
I took a look at the Internet and found these links, take a try and hope it helps.
 
  
 
 
  
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
 
  
 
 
  
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
 
  
 
 
  
-- 
      
 
 
Regards!
 
Aron Tao
 
 
 
 
 
 
 
 
 
 
 
 
 
   
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月29日周四 下午3:11写道：
 
    
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
 
 
 
I selected spark in the cube design, advanced setting.
 
 
 
The cube build failed at step 3, with the following error log:
 
 
 
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager  at anovadata6.anovadata.local/192.168.230.199:8050
 
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
 
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability  of the cluster (191488 MB per container)
 
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
 
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
 
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
 
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
 
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries  under SPARK_HOME.
 
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip  -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
 
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar  -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
 
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar  added multiple times to distributed cache.
 
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip  -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
 
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to: 
 
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to: 
 
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view  permissions: Set(zettics); groups with view permissions: Set(); users  with modify permissions: Set(zettics); groups with modify permissions: Set()
 
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
 
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
 
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
 
18/11/29 09:50:40 INFO yarn.Client: 
 
         client token: N/A
 
        diagnostics: AM container is launched, waiting for AM container to Register with RM
 
        ApplicationMaster host: N/A
 
        ApplicationMaster RPC port: -1
 
        queue: default
 
        start time: 1543503039903
 
        final status: UNDEFINED
 
        tracking URL:  http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
 
        user: zettics
 
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
 
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
 
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
 
18/11/29 09:50:43 INFO yarn.Client: 
 
         client token: N/A
 
        diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002  exited with  exitCode: 1
 
For more detailed output, check the application tracking page:  http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
 
Diagnostics: Exception from container-launch.
 
Container id: container_e05_1543422353836_0088_02_000001
 
Exit code: 1
 
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:  line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:  bad substitution
 
 
 
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:  line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:  bad substitution
 
 
 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
 
        at org.apache.hadoop.util.Shell.run(Shell.java:848)
 
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
 
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
 
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
 
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
 
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
        at java.lang.Thread.run(Thread.java:745)
 
 
 
 
 
Thanks.
 
 
 
Kang-sen

RE: 回复：RE: anybody used spark to build cube in kylin 2.5.1?

Posted by Kang-Sen Lu <kl...@anovadata.com>.

Hi, Chao: (I hope I got your first name correctly.)

Thanks for the reply. I have recognized that KYLIN-3699 was opened to address this problem.

I believe there is no bug opened to address the problem that only SEQUENCEFILE is supported for spark cube build. Right?

Kang-sen

From: Chao Long <wa...@qq.com>
Sent: Sunday, December 09, 2018 11:50 AM
To: user <us...@kylin.apache.org>
Subject: 回复：RE: anybody used spark to build cube in kylin 2.5.1?

Hi KangSen,
   There is a known jira issue about Spark cubing failed at step7 with no input data.
   https://issues.apache.org/jira/browse/KYLIN-3699

------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>>;
发送时间: 2018年12月8日(星期六) 凌晨5:32
收件人: "user@kylin.apache.org<ma...@kylin.apache.org>>;
主题: RE: anybody used spark to build cube in kylin 2.5.1?

I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.

I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.

In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.

Thanks.

Kang-sen


From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Friday, December 07, 2018 11:35 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

The spark cube build does not have correct support for non-SEQUENCEFILE.

In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE

Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds

The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042

Thanks.

Kang-sen

From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Shaofeng:

I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:


“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.



Here is my cmd:



2018-12-06 11:50:02,665 INFO  [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.cores=1  --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec  --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.master=yarn  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true  --conf spark.executor.instances=40  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.executor.memory=4G  --conf spark.yarn.queue=default  --conf spark.submit.deployMode=cluster  --conf spark.dynamicAllocation.minExecutors=1  --conf spark.network.timeout=600  --conf spark.hadoop.dfs.replication=2  --conf spark.yarn.executor.memoryOverhead=1024  --conf spark.dynamicAllocation.executorIdleTimeout=300  --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history  --conf spark.driver.memory=2G  --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec  --conf spark.eventLog.enabled=true  --conf spark.shuffle.service.enabled=true  --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog  --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar  --conf spark.dynamicAllocation.maxExecutors=1000  --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata<mailto:anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata>



And this is the cmd on kylin doc:


2017-03-06 14:44:38,574 INFO  [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=1  --conf spark.yarn.queue=default  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn  --conf spark.executor.extraJavaOptions=-Dhdp.version=current  --conf spark.executor.memory=1G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.executor.cores=2  --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube



My question is “what config parameter can cause this difference”?



Thanks.



Kang-sen




From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Shaofeng:

I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>hive.hwi.war.file</name>
  <value>/usr/lib/hive/lib/hive-hwi-.war</value>
  <description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://anovadata6.anovadata.local:9083</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/apps/hive/warehouse</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

But in spark run stderr, I still see that spark thinks the metastore is DERBY:

18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY

Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?

Kang-sen

From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Shaofeng:

I am not sure about how to allow spark gain access to the hive table which was build by kylin.

I did search internet about spark and hive integration, but I failed to find out a concrete example.

Anyway, I updated my kylin/spark/conf/hive-site.xml,

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

And restarted kylin. But I still get the following erroe:

18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)

My question is why spark is not able to find the hive metastore location?

If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.

Kang-sen

From: ShaoFeng Shi <sh...@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

Just double check it; The error message is clear, and do some search with Spark + Hive.

If possible, we suggest using the sequence file (default config) for the intermediate hive table.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道：
Hi, Shaofeng:

Thanks for the reply.

This is a line in my kylin.properties:

kylin.source.hive.flat-table-storage-format=TEXTFILE

I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)

The cube-build still failed, the stderr log is as follows:

18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
        ... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}



From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

Hi Kang-sen,

When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道：
Hi, SHaofeng:

Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:

18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
        ... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook

Kang-sen

From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:

cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>




Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道：
Thanks for the reply from Yichen and Aron. This is my kylin.properties:

kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40

But I still get the same error.

Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

                at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
                at org.apache.hadoop.util.Shell.run(Shell.java:848)
                at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
                at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)

I also saw in stderr:

Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.

Kang-sen





From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

Hi Kang-Sen,

I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.

## uncomment for HDP

#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current

#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current

#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html

Regards,
Yichen


JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道：
Hi

I took a look at the Internet and found these links, take a try and hope it helps.

https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html

https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster

--

Regards!
Aron Tao


Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道：
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.

I selected spark in the cube design, advanced setting.

The cube build failed at step 3, with the following error log:

OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zettics); groups with view permissions: Set(); users  with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
         client token: N/A
        diagnostics: AM container is launched, waiting for AM container to Register with RM
        ApplicationMaster host: N/A
        ApplicationMaster RPC port: -1
        queue: default
        start time: 1543503039903
        final status: UNDEFINED
        tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
        user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
         client token: N/A
        diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
        at org.apache.hadoop.util.Shell.run(Shell.java:848)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


Thanks.

Kang-sen