You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Camille Marvin (JIRA)" <ji...@apache.org> on 2014/04/16 02:38:15 UTC

[jira] [Updated] (HIVE-6918) ALTER TABLE using embedded metastore fails with duplicate key violation in 'dbo.SERDES'

     [ https://issues.apache.org/jira/browse/HIVE-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Marvin updated HIVE-6918:
---------------------------------

    Description: 
An HDINSIGHT customer is doing some heavy metadata operations using an embedded metastore. They get an error with a duplicate key in a metastore table 'dbo.SERDES'. They have multiple concurrent jobs doing ALTER TABLE concurrently (on different tables) using the same metastore database, but with each job having an embedded metastore because they set hive.metastore.uris to the empty string.

The script looks like:

set hive.metastore.uris=;
...
CREATE EXTERNAL TABLE IF NOT EXISTS InputData_828c53de_ad24_928e_3db3_948cf821a3e0 (
...
)
PARTITIONED BY (tenant string, d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
... (several more like this);
ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') LOCATION 'wasb://inputblob@storage27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
... several more like the above (14 ALTER TABLE statements in a row)
...

Then they get this error:

...
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 'PK_serdes_SERDE_ID'. Cannot insert duplicate key in object 'dbo.SERDES'. The duplicate key value is (209703).
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
	at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
	at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
	at org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
...

  was:
An HDINSIGHT customer is doing some heavy metadata operations using an embedded metastore. They get an error with a duplicate key in a metastore table 'dbo.SERDES'. They have multiple concurrent jobs doing ALTER TABLE concurrently (on different tables, I believe) using the same metastore database, but with each job having an embedded metastore because they set hive.metastore.uris to the empty string.

The script looks like:

set hive.metastore.uris=;
...
CREATE EXTERNAL TABLE IF NOT EXISTS InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 (
...
)
PARTITIONED BY (tenant string, d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
ALTER TABLE InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
... (several more like this);
ALTER TABLE InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') LOCATION 'wasb://instancespaceblob@advisor27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
... several more like the above (14 ALTER TABLE statements in a row)
...

Then they get this error:

...
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 'PK_serdes_SERDE_ID'. Cannot insert duplicate key in object 'dbo.SERDES'. The duplicate key value is (209703).
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
	at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
	at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
	at org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
...



removing product-specific attributes from script, clarifying uncertainty (yes, it is separate tables)

> ALTER TABLE using embedded metastore fails with duplicate key violation in 'dbo.SERDES'
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-6918
>                 URL: https://issues.apache.org/jira/browse/HIVE-6918
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.11.0
>         Environment: hive-0.11.0.1.3.7.0-01272
> HDInsight version: 2.1.4.0.661685
>            Reporter: Eric Hanson
>
> An HDINSIGHT customer is doing some heavy metadata operations using an embedded metastore. They get an error with a duplicate key in a metastore table 'dbo.SERDES'. They have multiple concurrent jobs doing ALTER TABLE concurrently (on different tables) using the same metastore database, but with each job having an embedded metastore because they set hive.metastore.uris to the empty string.
> The script looks like:
> set hive.metastore.uris=;
> ...
> CREATE EXTERNAL TABLE IF NOT EXISTS InputData_828c53de_ad24_928e_3db3_948cf821a3e0 (
> ...
> )
> PARTITIONED BY (tenant string, d string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> ;
> ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
> ... (several more like this);
> ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') LOCATION 'wasb://inputblob@storage27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
> ... several more like the above (14 ALTER TABLE statements in a row)
> ...
> Then they get this error:
> ...
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> NestedThrowablesStackTrace:
> java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 'PK_serdes_SERDE_ID'. Cannot insert duplicate key in object 'dbo.SERDES'. The duplicate key value is (209703).
> 	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
> 	at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
> 	at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
> 	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
> 	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
> 	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
> 	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
> 	at org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
> 	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
> ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)