You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Naveen Gangam (JIRA)" <ji...@apache.org> on 2017/06/06 20:57:18 UTC

[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

    [ https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039623#comment-16039623 ] 

Naveen Gangam commented on HIVE-16667:
--------------------------------------

[~rusanu] I have had a chance to test with VARCHAR in the JDO mappings file with Oracle, Postgres and Derby. Seems to be working without any issues.
I am uploading the patch with the changes. Could you please review it when you get a chance? Thanks

{code}
2017-06-06T15:50:41,716 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 280.51028ms + 0.084425ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and "TBLS"."TBL_NAME" = ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"      and "DBS"."NAME" = ? ]
2017-06-06T15:50:41,858 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 141.674264ms + 0.379869ms, the query is [select "PARTITIONS"."PART_ID", "SDS"."SD_ID", "SDS"."CD_ID", "SERDES"."SERDE_ID", "PARTITIONS"."CREATE_TIME", "PARTITIONS"."LAST_ACCESS_TIME", "SDS"."INPUT_FORMAT", "SDS"."IS_COMPRESSED", "SDS"."IS_STOREDASSUBDIRECTORIES", "SDS"."LOCATION", "SDS"."NUM_BUCKETS", "SDS"."OUTPUT_FORMAT", "SERDES"."NAME", "SERDES"."SLIB" from "PARTITIONS"  left outer join "SDS" on "PARTITIONS"."SD_ID" = "SDS"."SD_ID"   left outer join "SERDES" on "SDS"."SERDE_ID" = "SERDES"."SERDE_ID" where "PART_ID" in (1) order by "PART_NAME" asc]
2017-06-06T15:50:42,002 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 142.592725ms + 0.294649ms, the query is [select "PART_ID", "PARAM_KEY", "PARAM_VALUE" from "PARTITION_PARAMS" where "PART_ID" in (1) and "PARAM_KEY" is not null order by "PART_ID" asc]
2017-06-06T15:50:42,142 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 139.308419ms + 0.229592ms, the query is [select "PART_ID", "PART_KEY_VAL" from "PARTITION_KEY_VALS" where "PART_ID" in (1) and "INTEGER_IDX" >= 0 order by "PART_ID" asc, "INTEGER_IDX" asc]
2017-06-06T15:50:42,281 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 139.136081ms + 0.092378ms, the query is [select "SD_ID", "PARAM_KEY", "PARAM_VALUE" from "SD_PARAMS" where "SD_ID" in (2) and "PARAM_KEY" is not null order by "SD_ID" asc]
2017-06-06T15:50:42,423 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 141.601391ms + 0.08076ms, the query is [select "SD_ID", "COLUMN_NAME", "SORT_COLS"."ORDER" from "SORT_COLS" where "SD_ID" in (2) and "INTEGER_IDX" >= 0 order by "SD_ID" asc, "INTEGER_IDX" asc]
2017-06-06T15:50:42,564 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 140.216113ms + 0.114448ms, the query is [select "SD_ID", "BUCKET_COL_NAME" from "BUCKETING_COLS" where "SD_ID" in (2) and "INTEGER_IDX" >= 0 order by "SD_ID" asc, "INTEGER_IDX" asc]
2017-06-06T15:50:42,703 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 139.086542ms + 0.094296ms, the query is [select "SD_ID", "SKEWED_COL_NAME" from "SKEWED_COL_NAMES" where "SD_ID" in (2) and "INTEGER_IDX" >= 0 order by "SD_ID" asc, "INTEGER_IDX" asc]
2017-06-06T15:50:42,847 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 142.786353ms + 0.433393ms, the query is [select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" >= 0 order by "CD_ID" asc, "INTEGER_IDX" asc]
2017-06-06T15:50:42,990 DEBUG [pool-7-thread-4] metastore.MetaStoreDirectSql: Direct SQL query in 142.115654m
{code}

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16667
>                 URL: https://issues.apache.org/jira/browse/HIVE-16667
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Remus Rusanu
>            Assignee: Naveen Gangam
>         Attachments: HIVE-16667.patch, HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with PostgreSQL. The value is written out-of-band and the LOB handle is written,as an INT, into the table. SELECTs return the INT value, which should had been read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier metastore versions (they retain their string storage) vs. values inserted after the upgrade (inserted as LOB roots).
> Teh code in {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does not happen, the value is a Java String containing the int which is the LOB root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but that is just the luck of the path taken by the code. Inspection of my PG metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)