You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by ShaoFeng Shi <sh...@apache.org> on 2018/12/02 07:03:37 UTC
Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will
use Hive catalog to parse the data into RDD. In this case, it needs the
"hive-site.xml" in spark/conf folder. Please confirm whether it is this
case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月1日周六 上午12:30写道:
> Hi, SHaofeng:
>
>
>
> Your suggestion made some progress. Now the step3 of cube build go further
> and showed another problem. Here is the stderr log:
>
>
>
> 18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path
> hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
>
> 18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext;
> some configuration may not take effect.
>
> 18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is
> 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@53f1ff78
> {/SQL/execution,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@740d6f25
> {/SQL/execution/json,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO hive.HiveUtils: Initializing
> HiveMetastoreConnection version 1.2.1 using Spark classes.
>
> 18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>
> 18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize
> called
>
> 18/11/30 11:14:21 INFO DataNucleus.Persistence: Property
> datanucleus.cache.level2 unknown - will be ignored
>
> 18/11/30 11:14:21 INFO DataNucleus.Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>
> 18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin
> classes with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL,
> underlying DB is DERBY
>
> 18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
>
> 18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not
> found in metastore. hive.metastore.schema.verification is not enabled so
> recording the schema version 1.2.0
>
> 18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database
> default, returning NoSuchObjectException
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in
> metastore
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in
> metastore
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin
> role, since config is empty
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_all_databases
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions:
> db=default pat=*
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_functions: db=default pat=*
>
> 18/11/30 11:14:25 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:25 INFO session.SessionState: Created local directory:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
>
> 18/11/30 11:14:25 INFO session.SessionState: Created local directory:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
>
> 18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
>
> 18/11/30 11:14:25 INFO session.SessionState: Created local directory:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
>
> 18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
>
> 18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive
> client (version 1.2.1) is
> file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_database: default
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database:
> global_temp
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_database: global_temp
>
> 18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database
> global_temp, returning NoSuchObjectException
>
> 18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command:
> zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
>
> 18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table :
> db=zetticsdw
> tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
>
> 18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_table : db=zetticsdw
> tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
>
>
> 18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw
> exception: java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';
>
> java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';
>
> at
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>
> at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
>
> Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
> Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';
>
> at
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
>
> at
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
>
> at scala.Option.getOrElse(Option.scala:121)
>
> at
> org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
>
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>
> at org.apache.spark.sql.hive.HiveExternalCatalog.org
> $apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
>
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
>
> at
> org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
>
> at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
>
> at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
>
> at
> org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
>
> at
> org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
>
> at
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
>
> ... 6 more
>
> 18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 15, (reason: User class threw exception:
> java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';)
>
> 18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown
> hook
>
>
>
> Kang-sen
>
>
>
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, November 30, 2018 8:53 AM
> *To:* user <us...@kylin.apache.org>
> *Subject:* Re: anybody used spark to build cube in kylin 2.5.1?
>
>
>
> A solution is to put a "java-opts" file in spark/conf folder, adding the
> 'hdp.version' configuration, like this:
>
>
>
> cat /usr/local/spark/conf/java-opts
>
> -Dhdp.version=2.4.0.0-169
>
>
>
>
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
> Apache Kylin PMC
>
> Work email: shaofeng.shi@kyligence.io
>
> Kyligence Inc: https://kyligence.io/
>
>
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>
> Join Kylin user mail group: user-subscribe@kylin.apache.org
>
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
>
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2018年11月30日周五 下午9:04写道:
>
> Thanks for the reply from Yichen and Aron. This is my kylin.properties:
>
>
>
> kylin.engine.spark-conf.spark.yarn.archive=hdfs://
> 192.168.230.199:8020/user/zettics/spark/spark-libs.jar
>
>
> ##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
>
> #
>
> ## uncomment for HDP
>
>
> kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
>
>
> kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
>
>
> kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
>
>
>
> But I still get the same error.
>
>
>
> Stack trace: ExitCodeException exitCode=1:
> /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh:
> line 26:
> $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
> bad substitution
>
>
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:848)
>
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
>
>
>
> I also saw in stderr:
>
>
>
> Log Type: stderr
>
> Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
>
> Log Length: 88
>
> Error: Could not find or load main class
> org.apache.spark.deploy.yarn.ApplicationMaster
>
>
>
> I suspect my problem is related to the fact that “${hdp.version}” was not
> resolved somehow. It seems that kylin.properties parameters like
> “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
>
>
>
> Kang-sen
>
>
>
>
>
>
>
>
>
>
>
> *From:* Yichen Zhou <zh...@gmail.com>
> *Sent:* Thursday, November 29, 2018 9:08 PM
> *To:* user@kylin.apache.org
> *Subject:* Re: anybody used spark to build cube in kylin 2.5.1?
>
>
>
> Hi Kang-Sen,
>
>
>
> I think Jiatao is right. If you want to use spark to build cube in HDP
> cluster, you need to config -Dhdp.version in
> $KYLIN_HOME/conf/kylin.properties.
>
> ## uncomment for HDP
>
> #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
>
> #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
>
> #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
>
> Please refer to this:
> http://kylin.apache.org/docs/tutorial/cube_spark.html
>
>
>
> Regards,
>
> Yichen
>
>
>
>
>
> JiaTao Tao <ta...@gmail.com> 于2018年11月30日周五 上午9:57写道:
>
> Hi
>
>
>
> I took a look at the Internet and found these links, take a try and hope
> it helps.
>
>
>
>
> https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
>
>
>
>
> https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
>
>
>
> --
>
>
>
> Regards!
>
> Aron Tao
>
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2018年11月29日周四 下午3:11写道:
>
> We are running kylin 2.5.1. For a specific cube created, the cube build
> for one hour of data took 200 minutes. So I am thinking about building cube
> with spark, instead of map-reduce.
>
>
>
> I selected spark in the cube design, advanced setting.
>
>
>
> The cube build failed at step 3, with the following error log:
>
>
>
> OS command error exit with return code: 1, error message: 18/11/29
> 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at
> anovadata6.anovadata.local/192.168.230.199:8050
>
> 18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from
> cluster with 1 NodeManagers
>
> 18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not
> requested more than the maximum memory capability of the cluster (191488 MB
> per container)
>
> 18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432
> MB memory including 384 MB overhead
>
> 18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context
> for our AM
>
> 18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for
> our AM container
>
> 18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM
> container
>
> 18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor
> spark.yarn.archive is set, falling back to uploading libraries under
> SPARK_HOME.
>
> 18/11/29 09:50:38 INFO yarn.Client: Uploading resource
> file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip
> ->
> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
>
> 18/11/29 09:50:39 INFO yarn.Client: Uploading resource
> file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar
> ->
> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
>
> 18/11/29 09:50:39 WARN yarn.Client: Same path resource
> file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar
> added multiple times to distributed cache.
>
> 18/11/29 09:50:39 INFO yarn.Client: Uploading resource
> file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip
> ->
> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
>
> 18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will
> not take effect in cluster mode
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to:
> zettics
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to:
> zettics
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups
> to:
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups
> to:
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(zettics); groups with view permissions: Set(); users with modify
> permissions: Set(zettics); groups with modify permissions: Set()
>
> 18/11/29 09:50:39 INFO yarn.Client: Submitting application
> application_1543422353836_0088 to ResourceManager
>
> 18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application
> application_1543422353836_0088
>
> 18/11/29 09:50:40 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: ACCEPTED)
>
> 18/11/29 09:50:40 INFO yarn.Client:
>
> client token: N/A
>
> diagnostics: AM container is launched, waiting for AM container to
> Register with RM
>
> ApplicationMaster host: N/A
>
> ApplicationMaster RPC port: -1
>
> queue: default
>
> start time: 1543503039903
>
> final status: UNDEFINED
>
> tracking URL:
> http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
>
> user: zettics
>
> 18/11/29 09:50:41 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: ACCEPTED)
>
> 18/11/29 09:50:42 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: ACCEPTED)
>
> 18/11/29 09:50:43 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: FAILED)
>
> 18/11/29 09:50:43 INFO yarn.Client:
>
> client token: N/A
>
> diagnostics: Application application_1543422353836_0088 failed 2
> times due to AM Container for appattempt_1543422353836_0088_000002 exited
> with exitCode: 1
>
> For more detailed output, check the application tracking page:
> http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088
> Then click on links to logs of each attempt.
>
> Diagnostics: Exception from container-launch.
>
> Container id: container_e05_1543422353836_0088_02_000001
>
> Exit code: 1
>
> Exception message:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:
> line 26:
> $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
> bad substitution
>
>
>
> Stack trace: ExitCodeException exitCode=1:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:
> line 26:
> $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
> bad substitution
>
>
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:848)
>
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
>
>
RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.
I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.
In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Friday, December 07, 2018 11:35 AM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
The spark cube build does not have correct support for non-SEQUENCEFILE.
In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds
The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.
Here is my cmd:
2018-12-06 11:50:02,665 INFO [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.executor.memory=4G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata<mailto:anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata>
And this is the cmd on kylin doc:
2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
My question is “what config parameter can cause this difference”?
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
The spark cube build does not have correct support for non-SEQUENCEFILE.
In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds
The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.
Here is my cmd:
2018-12-06 11:50:02,665 INFO [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.executor.memory=4G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata<mailto:anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata>
And this is the cmd on kylin doc:
2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
My question is “what config parameter can cause this difference”?
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, Chao: (I hope I got your first name correctly.)
Thanks for the reply. I have recognized that KYLIN-3699 was opened to address this problem.
I believe there is no bug opened to address the problem that only SEQUENCEFILE is supported for spark cube build. Right?
Kang-sen
From: Chao Long <wa...@qq.com>
Sent: Sunday, December 09, 2018 11:50 AM
To: user <us...@kylin.apache.org>
Subject: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Hi KangSen,
There is a known jira issue about Spark cubing failed at step7 with no input data.
https://issues.apache.org/jira/browse/KYLIN-3699
------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>>;
发送时间: 2018年12月8日(星期六) 凌晨5:32
收件人: "user@kylin.apache.org<ma...@kylin.apache.org>>;
主题: RE: anybody used spark to build cube in kylin 2.5.1?
I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.
I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.
In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Friday, December 07, 2018 11:35 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
The spark cube build does not have correct support for non-SEQUENCEFILE.
In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds
The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.
Here is my cmd:
2018-12-06 11:50:02,665 INFO [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.executor.memory=4G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata<mailto:anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata>
And this is the cmd on kylin doc:
2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
My question is “what config parameter can cause this difference”?
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
回复:RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Chao Long <wa...@qq.com>.
Hi KangSen,
There is a known jira issue about Spark cubing failed at step7 with no input data.
https://issues.apache.org/jira/browse/KYLIN-3699
------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<kl...@anovadata.com>;
发送时间: 2018年12月8日(星期六) 凌晨5:32
收件人: "user@kylin.apache.org"<us...@kylin.apache.org>;
主题: RE: anybody used spark to build cube in kylin 2.5.1?
I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.
I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.
In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Friday, December 07, 2018 11:35 AM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
The spark cube build does not have correct support for non-SEQUENCEFILE.
In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE
Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds
The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”. Here is my cmd: 2018-12-06 11:50:02,665 INFO [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.executor.memory=4G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata And this is the cmd on kylin doc:
2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
My question is “what config parameter can cause this difference”? Thanks. Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0}
From: ShaoFeng Shi <sh...@apache.org>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, Shaofeng:
I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:
“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.
Here is my cmd:
2018-12-06 11:50:02,665 INFO [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.executor.memory=4G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata<mailto:anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata>
And this is the cmd on kylin doc:
2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
My question is “what config parameter can cause this difference”?
Thanks.
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, Shaofeng:
I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://anovadata6.anovadata.local:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
But in spark run stderr, I still see that spark thinks the metastore is DERBY:
18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?
Kang-sen
From: Kang-Sen Lu <kl...@anovadata.com>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org
Subject: RE: anybody used spark to build cube in kylin 2.5.1?
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, Shaofeng:
I am not sure about how to allow spark gain access to the hive table which was build by kylin.
I did search internet about spark and hive integration, but I failed to find out a concrete example.
Anyway, I updated my kylin/spark/conf/hive-site.xml,
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
And restarted kylin. But I still get the following erroe:
18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
My question is why spark is not able to find the hive metastore location?
If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>
Sent: Monday, December 03, 2018 7:53 PM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Just double check it; The error message is clear, and do some search with Spark + Hive.
If possible, we suggest using the sequence file (default config) for the intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen
Re: anybody used spark to build cube in kylin 2.5.1?
Posted by ShaoFeng Shi <sh...@apache.org>.
Just double check it; The error message is clear, and do some search with
Spark + Hive.
If possible, we suggest using the sequence file (default config) for the
intermediate hive table.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org
Kang-Sen Lu <kl...@anovadata.com> 于2018年12月3日周一 下午9:33写道:
> Hi, Shaofeng:
>
>
>
> Thanks for the reply.
>
>
>
> This is a line in my kylin.properties:
>
>
>
> kylin.source.hive.flat-table-storage-format=TEXTFILE
>
>
>
> I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
>
> (cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
>
>
>
> The cube-build still failed, the stderr log is as follows:
>
>
>
> 18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
>
> 18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not
> found in metastore. hive.metastore.schema.verification is not enabled so
> recording the schema version 1.2.0
>
> 18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database
> default, returning NoSuchObjectException
>
> 18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in
> metastore
>
> 18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in
> metastore
>
> 18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin
> role, since config is empty
>
> 18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
>
> 18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_all_databases
>
> 18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions:
> db=default pat=*
>
> 18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_functions: db=default pat=*
>
> 18/12/03 08:27:02 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/12/03 08:27:03 INFO session.SessionState: Created local directory:
> /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
>
> 18/12/03 08:27:03 INFO session.SessionState: Created local directory:
> /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
>
> 18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
>
> 18/12/03 08:27:03 INFO session.SessionState: Created local directory:
> /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
>
> 18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
>
> 18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive
> client (version 1.2.1) is
> file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
>
> 18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
>
> 18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_database: default
>
> 18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database:
> global_temp
>
> 18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_database: global_temp
>
> 18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database
> global_temp, returning NoSuchObjectException
>
> 18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command:
> zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
>
> 18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table :
> db=zetticsdw
> tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
>
> 18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_table : db=zetticsdw
> tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
>
>
> 18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw
> exception: java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'
> not found in database 'zetticsdw';
>
> java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'
> not found in database 'zetticsdw';
>
> at
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>
> at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
>
> Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
> Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'
> not found in database 'zetticsdw';
>
> at
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
>
> at
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
>
> at scala.Option.getOrElse(Option.scala:121)
>
> at
> org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
>
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>
> at org.apache.spark.sql.hive.HiveExternalCatalog.org
> $apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
>
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
>
> at
> org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
>
> at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
>
> at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
>
> at
> org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
>
> at
> org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
>
> at
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
>
> ... 6 more
>
> 18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 15, (reason: User class threw exception:
> java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef'
> not found in database 'zetticsdw';)
>
> 18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown
> hook
>
> 18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e
> {HTTP/1.1}{0.0.0.0:0}
>
>
>
>
>
>
>
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Sunday, December 02, 2018 2:04 AM
> *To:* user <us...@kylin.apache.org>
> *Subject:* Re: anybody used spark to build cube in kylin 2.5.1?
>
>
>
> Hi Kang-sen,
>
>
>
> When the intermediate table's file format is not sequence file, Kylin will
> use Hive catalog to parse the data into RDD. In this case, it needs the
> "hive-site.xml" in spark/conf folder. Please confirm whether it is this
> case, if true, put the file and then try again.
>
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
> Apache Kylin PMC
>
> Work email: shaofeng.shi@kyligence.io
>
> Kyligence Inc: https://kyligence.io/
>
>
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>
> Join Kylin user mail group: user-subscribe@kylin.apache.org
>
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
>
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2018年12月1日周六 上午12:30写道:
>
> Hi, SHaofeng:
>
>
>
> Your suggestion made some progress. Now the step3 of cube build go further
> and showed another problem. Here is the stderr log:
>
>
>
> 18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path
> hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
>
> 18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext;
> some configuration may not take effect.
>
> 18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is
> 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}
>
> 18/11/30 11:14:20 INFO hive.HiveUtils: Initializing
> HiveMetastoreConnection version 1.2.1 using Spark classes.
>
> 18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>
> 18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize
> called
>
> 18/11/30 11:14:21 INFO DataNucleus.Persistence: Property
> datanucleus.cache.level2 unknown - will be ignored
>
> 18/11/30 11:14:21 INFO DataNucleus.Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>
> 18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin
> classes with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:24 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL,
> underlying DB is DERBY
>
> 18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
>
> 18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not
> found in metastore. hive.metastore.schema.verification is not enabled so
> recording the schema version 1.2.0
>
> 18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database
> default, returning NoSuchObjectException
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in
> metastore
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in
> metastore
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin
> role, since config is empty
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_all_databases
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions:
> db=default pat=*
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_functions: db=default pat=*
>
> 18/11/30 11:14:25 INFO DataNucleus.Datastore: The class
> "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as
> "embedded-only" so does not have its own datastore table.
>
> 18/11/30 11:14:25 INFO session.SessionState: Created local directory:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
>
> 18/11/30 11:14:25 INFO session.SessionState: Created local directory:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
>
> 18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
>
> 18/11/30 11:14:25 INFO session.SessionState: Created local directory:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
>
> 18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
>
> 18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive
> client (version 1.2.1) is
> file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_database: default
>
> 18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database:
> global_temp
>
> 18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_database: global_temp
>
> 18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database
> global_temp, returning NoSuchObjectException
>
> 18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command:
> zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
>
> 18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table :
> db=zetticsdw
> tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
>
> 18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics
> ip=unknown-ip-addr cmd=get_table : db=zetticsdw
> tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
>
>
> 18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw
> exception: java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';
>
> java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';
>
> at
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>
> at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
>
> Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
> Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';
>
> at
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
>
> at
> org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
>
> at scala.Option.getOrElse(Option.scala:121)
>
> at
> org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
>
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>
> at org.apache.spark.sql.hive.HiveExternalCatalog.org
> $apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
>
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
>
> at
> org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
>
> at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
>
> at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
>
> at
> org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
>
> at
> org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
>
> at
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
>
> ... 6 more
>
> 18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 15, (reason: User class threw exception:
> java.lang.RuntimeException: error execute
> org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view
> 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69'
> not found in database 'zetticsdw';)
>
> 18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown
> hook
>
>
>
> Kang-sen
>
>
>
> *From:* ShaoFeng Shi <sh...@apache.org>
> *Sent:* Friday, November 30, 2018 8:53 AM
> *To:* user <us...@kylin.apache.org>
> *Subject:* Re: anybody used spark to build cube in kylin 2.5.1?
>
>
>
> A solution is to put a "java-opts" file in spark/conf folder, adding the
> 'hdp.version' configuration, like this:
>
>
>
> cat /usr/local/spark/conf/java-opts
>
> -Dhdp.version=2.4.0.0-169
>
>
>
>
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
> Apache Kylin PMC
>
> Work email: shaofeng.shi@kyligence.io
>
> Kyligence Inc: https://kyligence.io/
>
>
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>
> Join Kylin user mail group: user-subscribe@kylin.apache.org
>
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
>
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2018年11月30日周五 下午9:04写道:
>
> Thanks for the reply from Yichen and Aron. This is my kylin.properties:
>
>
>
> kylin.engine.spark-conf.spark.yarn.archive=hdfs://
> 192.168.230.199:8020/user/zettics/spark/spark-libs.jar
>
>
> ##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
>
> #
>
> ## uncomment for HDP
>
>
> kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
>
>
> kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
>
>
> kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
>
>
>
> But I still get the same error.
>
>
>
> Stack trace: ExitCodeException exitCode=1:
> /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh:
> line 26:
> $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
> bad substitution
>
>
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:848)
>
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
>
>
>
> I also saw in stderr:
>
>
>
> Log Type: stderr
>
> Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
>
> Log Length: 88
>
> Error: Could not find or load main class
> org.apache.spark.deploy.yarn.ApplicationMaster
>
>
>
> I suspect my problem is related to the fact that “${hdp.version}” was not
> resolved somehow. It seems that kylin.properties parameters like
> “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
>
>
>
> Kang-sen
>
>
>
>
>
>
>
>
>
>
>
> *From:* Yichen Zhou <zh...@gmail.com>
> *Sent:* Thursday, November 29, 2018 9:08 PM
> *To:* user@kylin.apache.org
> *Subject:* Re: anybody used spark to build cube in kylin 2.5.1?
>
>
>
> Hi Kang-Sen,
>
>
>
> I think Jiatao is right. If you want to use spark to build cube in HDP
> cluster, you need to config -Dhdp.version in
> $KYLIN_HOME/conf/kylin.properties.
>
> ## uncomment for HDP
>
> #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
>
> #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
>
> #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
>
> Please refer to this:
> http://kylin.apache.org/docs/tutorial/cube_spark.html
>
>
>
> Regards,
>
> Yichen
>
>
>
>
>
> JiaTao Tao <ta...@gmail.com> 于2018年11月30日周五 上午9:57写道:
>
> Hi
>
>
>
> I took a look at the Internet and found these links, take a try and hope
> it helps.
>
>
>
>
> https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
>
>
>
>
> https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
>
>
>
> --
>
>
>
> Regards!
>
> Aron Tao
>
>
>
>
>
> Kang-Sen Lu <kl...@anovadata.com> 于2018年11月29日周四 下午3:11写道:
>
> We are running kylin 2.5.1. For a specific cube created, the cube build
> for one hour of data took 200 minutes. So I am thinking about building cube
> with spark, instead of map-reduce.
>
>
>
> I selected spark in the cube design, advanced setting.
>
>
>
> The cube build failed at step 3, with the following error log:
>
>
>
> OS command error exit with return code: 1, error message: 18/11/29
> 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at
> anovadata6.anovadata.local/192.168.230.199:8050
>
> 18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from
> cluster with 1 NodeManagers
>
> 18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not
> requested more than the maximum memory capability of the cluster (191488 MB
> per container)
>
> 18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432
> MB memory including 384 MB overhead
>
> 18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context
> for our AM
>
> 18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for
> our AM container
>
> 18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM
> container
>
> 18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor
> spark.yarn.archive is set, falling back to uploading libraries under
> SPARK_HOME.
>
> 18/11/29 09:50:38 INFO yarn.Client: Uploading resource
> file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip
> ->
> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
>
> 18/11/29 09:50:39 INFO yarn.Client: Uploading resource
> file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar
> ->
> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
>
> 18/11/29 09:50:39 WARN yarn.Client: Same path resource
> file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar
> added multiple times to distributed cache.
>
> 18/11/29 09:50:39 INFO yarn.Client: Uploading resource
> file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip
> ->
> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
>
> 18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will
> not take effect in cluster mode
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to:
> zettics
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to:
> zettics
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups
> to:
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups
> to:
>
> 18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(zettics); groups with view permissions: Set(); users with modify
> permissions: Set(zettics); groups with modify permissions: Set()
>
> 18/11/29 09:50:39 INFO yarn.Client: Submitting application
> application_1543422353836_0088 to ResourceManager
>
> 18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application
> application_1543422353836_0088
>
> 18/11/29 09:50:40 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: ACCEPTED)
>
> 18/11/29 09:50:40 INFO yarn.Client:
>
> client token: N/A
>
> diagnostics: AM container is launched, waiting for AM container to
> Register with RM
>
> ApplicationMaster host: N/A
>
> ApplicationMaster RPC port: -1
>
> queue: default
>
> start time: 1543503039903
>
> final status: UNDEFINED
>
> tracking URL:
> http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
>
> user: zettics
>
> 18/11/29 09:50:41 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: ACCEPTED)
>
> 18/11/29 09:50:42 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: ACCEPTED)
>
> 18/11/29 09:50:43 INFO yarn.Client: Application report for
> application_1543422353836_0088 (state: FAILED)
>
> 18/11/29 09:50:43 INFO yarn.Client:
>
> client token: N/A
>
> diagnostics: Application application_1543422353836_0088 failed 2
> times due to AM Container for appattempt_1543422353836_0088_000002 exited
> with exitCode: 1
>
> For more detailed output, check the application tracking page:
> http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088
> Then click on links to logs of each attempt.
>
> Diagnostics: Exception from container-launch.
>
> Container id: container_e05_1543422353836_0088_02_000001
>
> Exit code: 1
>
> Exception message:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:
> line 26:
> $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
> bad substitution
>
>
>
> Stack trace: ExitCodeException exitCode=1:
> /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh:
> line 26:
> $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:
> bad substitution
>
>
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:848)
>
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
>
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
>
>
RE: anybody used spark to build cube in kylin 2.5.1?
Posted by Kang-Sen Lu <kl...@anovadata.com>.
Hi, Shaofeng:
Thanks for the reply.
This is a line in my kylin.properties:
kylin.source.hive.flat-table-storage-format=TEXTFILE
I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)
The cube-build still failed, the stderr log is as follows:
18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0}
From: ShaoFeng Shi <sh...@apache.org>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <us...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-sen,
When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:
Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:
18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics ip=unknown-ip-addr cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook
Kang-sen
From: ShaoFeng Shi <sh...@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <us...@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:
cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<ma...@kyligence.io>
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<ma...@kylin.apache.org>
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:
kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40
But I still get the same error.
Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
I also saw in stderr:
Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.
Kang-sen
From: Yichen Zhou <zh...@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?
Hi Kang-Sen,
I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.
## uncomment for HDP
#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html
Regards,
Yichen
JiaTao Tao <ta...@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi
I took a look at the Internet and found these links, take a try and hope it helps.
https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html
https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster
--
Regards!
Aron Tao
Kang-Sen Lu <kl...@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.
I selected spark in the cube design, advanced setting.
The cube build failed at step 3, with the following error log:
OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zettics); groups with view permissions: Set(); users with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1543503039903
final status: UNDEFINED
tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
at org.apache.hadoop.util.Shell.run(Shell.java:848)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks.
Kang-sen