You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (Jira)" <ji...@apache.org> on 2020/03/09 21:55:00 UTC

[jira] [Commented] (SPARK-31099) Create migration script for metastore_db

    [ https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055413#comment-17055413 ] 

Gengliang Wang commented on SPARK-31099:
----------------------------------------

cc [~yumwang] 

> Create migration script for metastore_db
> ----------------------------------------
>
>                 Key: SPARK-31099
>                 URL: https://issues.apache.org/jira/browse/SPARK-31099
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> When an existing Derby database exists (in ./metastore_db) created by Hive 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile.
> Repro steps:
> 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive -Phive-thriftserver. Make sure there's no existing ./metastore_db directory in the repo.
> 2. Run bin/spark-shell, and then spark.sql("show databases"). This will populate the ./metastore_db directory, where the Derby-based metastore database is hosted. This database is populated from Hive 1.2.x.
> 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops the Hive 1.2 profile, which makes it use the default Hive 2.3 profile)
> 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby database created in Step (2), which triggers an upgrade step, and that's where the following error will be reported.
> 5. Delete the ./metastore_db and re-run Step (4). The error is no longer reported.
> {code:java}
> 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT clause was not specified or was specified as DEFAULT NULL.
> java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT clause was not specified or was specified as DEFAULT NULL.
> 	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
> 	at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
> 	at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
> 	at org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257)
> 	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398)
> 	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
> 	at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
> 	at org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
> 	at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
> 	at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
> 	at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
> 	at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
> 	at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
> 	at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
> 	at org.datanucleus.store.query.Query.execute(Query.java:1726)
> 	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
> 	at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
> 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:184)
> 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
> 	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
> 	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> 	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> 	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
> 	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
> 	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
> 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
> 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
> 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:343)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:369)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:280)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:316)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:272)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:359)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:472)
> 	at org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1(PoolingHiveClient.scala:267)
> 	at org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1$adapted(PoolingHiveClient.scala:266)
> 	at org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
> 	at org.apache.spark.sql.hive.client.PoolingHiveClient.databaseExists(PoolingHiveClient.scala:266)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:286)
> 	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:145)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:106)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:144)
> 	at com.databricks.spark.util.NoopProgressReporter$.withStatusCode(ProgressReporter.scala:52)
> 	at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:143)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:286)
> 	at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212)
> 	at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199)
> 	at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47)
> 	at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:62)
> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:94)
> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:94)
> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:270)
> 	at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:191)
> 	at org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:43)
> 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
> 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
> 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
> 	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231)
> 	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3612)
> 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115)
> 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246)
> 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
> 	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76)
> 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196)
> 	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3610)
> 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:231)
> 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
> 	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:662)
> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:657)
> 	at $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
> 	at $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
> 	at $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
> 	at $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
> 	at $line50594476574342420814.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
> 	at $line50594476574342420814.$read$$iw$$iw$$iw.<init>(<console>:36)
> 	at $line50594476574342420814.$read$$iw$$iw.<init>(<console>:38)
> 	at $line50594476574342420814.$read$$iw.<init>(<console>:40)
> 	at $line50594476574342420814.$read.<init>(<console>:42)
> 	at $line50594476574342420814.$read$.<init>(<console>:46)
> 	at $line50594476574342420814.$read$.<clinit>(<console>)
> 	at $line50594476574342420814.$eval$.$print$lzycompute(<console>:7)
> 	at $line50594476574342420814.$eval$.$print(<console>:6)
> 	at $line50594476574342420814.$eval.$print(<console>)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
> 	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
> 	at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
> 	at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
> 	at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
> 	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
> 	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
> 	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
> 	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
> 	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
> 	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
> 	at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
> 	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
> 	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
> 	at org.apache.spark.repl.Main$.doMain(Main.scala:78)
> 	at org.apache.spark.repl.Main$.main(Main.scala:58)
> 	at org.apache.spark.repl.Main.main(Main.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
> 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: ERROR 42601: In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT clause was not specified or was specified as DEFAULT NULL.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.ColumnDefinitionNode.bindAndValidateDefault(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.TableElementList.validate(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.AlterTableNode.bindStatement(Unknown Source)
> 	at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
> 	at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> 	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown Source)
> 	... 157 more
> ...
> 20/03/09 13:57:05 ERROR ObjectStore: Version information found in metastore differs 1.2.0 from expected schema version 2.3.0. Schema verififcation is disabled hive.metastore.schema.verification
> 20/03/09 13:57:05 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore krismok@10.0.0.76
> {code}
> It would be great if there is a migration script to upgrade the metastore_db from the older version to new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org