You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ryan Pridgeon (JIRA)" <ji...@apache.org> on 2014/11/20 22:32:34 UTC
[jira] [Commented] (HIVE-8730) schemaTool failure when date
partition has non-date value
[ https://issues.apache.org/jira/browse/HIVE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220036#comment-14220036 ]
Ryan Pridgeon commented on HIVE-8730:
-------------------------------------
The schema tool also fails to update PART_NAME in table PARTITIONS. Without this Hive will enter an infinite loop when a user attempts to drop the table.
I would include the following Update statement in addition to handling none date type entries such as __HIVE_DEFAULT_PARTITION__
UPDATE PARTITIONS JOIN PARTITION_KEYS ON PARTITIONS.TBL_ID= PARTITION_KEYS.TBL_ID
SET PART_NAME=ifnull(REPLACE(PART_NAME,RIGHT(PART_NAME,8),cast(RIGHT(PART_NAME,8) as date)),PART_NAME)
WHERE PKEY_TYPE= 'date' and
PART_NAME not like '%-%-%';
Ref:http://github.mtv.cloudera.com/CDH/hive/blob/cdh5-0.13.1_5.2.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#1504
LINE 1480:
private List<Path> dropPartitionsAndGetLocations(RawStore ms, String dbName,
String tableName, Path tablePath, List<FieldSchema> partitionKeys, boolean checkLocation)
throws MetaException, IOException, NoSuchObjectException, InvalidObjectException,
InvalidInputException {
int partitionBatchSize = HiveConf.getIntVar(hiveConf,
ConfVars.METASTORE_BATCH_RETRIEVE_MAX);
Path tableDnsPath = null;
if (tablePath != null) {
tableDnsPath = wh.getDnsPath(tablePath);
}
List<Path> partPaths = new ArrayList<Path>();
Table tbl = ms.getTable(dbName, tableName);
// call dropPartition on each of the table's partitions to follow the
// procedure for cleanly dropping partitions.
while (true) {
List<Partition> partsToDelete = ms.getPartitions(dbName, tableName, partitionBatchSize);
if (partsToDelete == null || partsToDelete.isEmpty()) {
break;
}
List<String> partNames = new ArrayList<String>();
for (Partition part : partsToDelete) {
if (checkLocation && part.getSd() != null &&
part.getSd().getLocation() != null) {
Path partPath = wh.getDnsPath(new Path(part.getSd().getLocation()));
if (tableDnsPath == null ||
(partPath != null && !isSubdirectory(tableDnsPath, partPath))) {
if (!wh.isWritable(partPath.getParent())) {
throw new MetaException("Table metadata not deleted since the partition " +
Warehouse.makePartName(partitionKeys, part.getValues()) +
" has parent location " + partPath.getParent() + " which is not writable " +
"by " + hiveConf.getUser());
}
partPaths.add(partPath);
}
}
partNames.add(Warehouse.makePartName(tbl.getPartitionKeys(), part.getValues()));
}
ms.dropPartitions(dbName, tableName, partNames);
}
return partPaths;
}
> schemaTool failure when date partition has non-date value
> ---------------------------------------------------------
>
> Key: HIVE-8730
> URL: https://issues.apache.org/jira/browse/HIVE-8730
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 0.13.0
> Environment: CDH5.2
> Reporter: Johndee Burks
> Assignee: Chaoyu Tang
> Priority: Minor
>
> If there is a none date value in the PART_KEY_VAL column within the PARTITION_KEY_VALS table in the metastore db, this will cause the HIVE-5700 script to fail. The failure will be picked up by the schemaTool causing the upgrade to fail. A classic example of a value that can be present without users really being aware is __HIVE_DEFAULT_PARTITION__ which is filled in by hive automatically when doing dynamic partitioning and value is not present in source data for the partition column.
> The reason for the failure is that the upgrade script does not account for none date values. What it is currently:
> {code}
> UPDATE PARTITION_KEY_VALS
> INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID
> INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID
> AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX
> AND PARTITION_KEYS.PKEY_TYPE = 'date'
> SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL);
> {code}
> What it should be to avoid issue:
> {code}
> UPDATE PARTITION_KEY_VALS
> INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID
> INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID
> AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX
> AND PARTITION_KEYS.PKEY_TYPE = 'date'
> AND PART_KEY_VAL != '__HIVE_DEFAULT_PARTITION__'
> SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL);
> {code}
> == Metastore DB
> {code}
> mysql> select * from PARTITION_KEY_VALS;
> +---------+----------------------------+-------------+
> | PART_ID | PART_KEY_VAL | INTEGER_IDX |
> +---------+----------------------------+-------------+
> | 171 | 2099-12-31 | 0 |
> | 172 | __HIVE_DEFAULT_PARTITION__ | 0 |
> | 184 | 2099-12-01 | 0 |
> | 185 | 2099-12-30 | 0 |
> +---------+----------------------------+-------------+
> {code}
> == stdout.log
> {code}
> 0: jdbc:mysql://10.16.8.121:3306/metastore> !autocommit on
> 0: jdbc:mysql://10.16.8.121:3306/metastore> SELECT 'Upgrading MetaStore schema from 0.12.0 to 0.13.0' AS ' '
> +---------------------------------------------------+--+
> | |
> +---------------------------------------------------+--+
> | Upgrading MetaStore schema from 0.12.0 to 0.13.0 |
> +---------------------------------------------------+--+
> 0: jdbc:mysql://10.16.8.121:3306/metastore> SELECT '< HIVE-5700 enforce single date format for partition column storage >' AS ' '
> +------------------------------------------------------------------------+--+
> | |
> +------------------------------------------------------------------------+--+
> | < HIVE-5700 enforce single date format for partition column storage > |
> +------------------------------------------------------------------------+--+
> 0: jdbc:mysql://10.16.8.121:3306/metastore> UPDATE PARTITION_KEY_VALS INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX AND PARTITION_KEYS.PKEY_TYPE = 'date' SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL)
> {code}
> == stderr.log
> {code}
> exec /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib/hive-cli-0.13.1-cdh5.2.0.jar org.apache.hive.beeline.HiveSchemaTool -verbose -dbType mysql -upgradeSchema
> Connecting to jdbc:mysql://10.16.8.121:3306/metastore?useUnicode=true&characterEncoding=UTF-8
> Connected to: MySQL (version 5.1.73)
> Driver: MySQL-AB JDBC Driver (version mysql-connector-java-5.1.17-SNAPSHOT ( Revision: ${bzr.revision-id} ))
> Transaction isolation: TRANSACTION_READ_COMMITTED
> Autocommit status: true
> 1 row selected (0.025 seconds)
> 1 row selected (0.004 seconds)
> Closing: 0: jdbc:mysql://10.16.8.121:3306/metastore?useUnicode=true&characterEncoding=UTF-8
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !!
> org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !!
> at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:252)
> at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:220)
> at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:530)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.io.IOException: Schema script failed, errorcode 2
> at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:410)
> at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:383)
> at org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:247)
> ... 7 more
> *** schemaTool failed ***
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)