You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "weiliang hao (Jira)" <ji...@apache.org> on 2022/12/05 07:23:00 UTC
[jira] [Commented] (SPARK-41241) Use Hive and Spark SQL to modify table field comment, the modified results of Hive cannot be queried using Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-41241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643141#comment-17643141 ]
weiliang hao commented on SPARK-41241:
--------------------------------------
[~xkrogen] The problem is that Spark modifies the Hive table field comment, and then uses Hive to modify, Spark cannot find the latest comment. I think Spark should be compatible with Hive, and there should be no data inconsistency when using the Spark or Hive engine to query.
> Use Hive and Spark SQL to modify table field comment, the modified results of Hive cannot be queried using Spark SQL
> --------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-41241
> URL: https://issues.apache.org/jira/browse/SPARK-41241
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
> Reporter: weiliang hao
> Priority: Major
>
> ---HIVE---
> > create table table_test(id int);
> > alter table table_test change column id id int comment "hive comment";
> > desc formatted table_test;
> {code:java}
> +-------------------------------+----------------------------------------------------+----------------------------------------------------+
> | col_name | data_type | comment |
> +-------------------------------+----------------------------------------------------+----------------------------------------------------+
> | # col_name | data_type | comment |
> | id | int | hive comment |
> | | NULL | NULL |
> | # Detailed Table Information | NULL | NULL |
> | Database: | default | NULL |
> | OwnerType: | USER | NULL |
> | Owner: | anonymous | NULL |
> | CreateTime: | Wed Nov 23 23:06:41 CST 2022 | NULL |
> | LastAccessTime: | UNKNOWN | NULL |
> | Retention: | 0 | NULL |
> | Location: | hdfs://localhost:8020/warehouse/tablespace/managed/hive/table_test | NULL |
> | Table Type: | MANAGED_TABLE | NULL |
> | Table Parameters: | NULL | NULL |
> | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\"}} |
> | | bucketing_version | 2 |
> | | last_modified_by | anonymous |
> | | last_modified_time | 1669216665 |
> | | numFiles | 0 |
> | | numRows | 0 |
> | | rawDataSize | 0 |
> | | totalSize | 0 |
> | | transactional | true |
> | | transactional_properties | default |
> | | transient_lastDdlTime | 1669216665 |
> | | NULL | NULL |
> | # Storage Information | NULL | NULL |
> | SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
> | InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
> | OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
> | Compressed: | No | NULL |
> | Num Buckets: | -1 | NULL |
> | Bucket Columns: | [] | NULL |
> | Sort Columns: | [] | NULL |
> | Storage Desc Params: | NULL | NULL |
> | | serialization.format | 1 |
> +-------------------------------+----------------------------------------------------+----------------------------------------------------+ {code}
> ---SPARK---
> > alter table table_test change column id id int comment "spark comment";
> > desc formatted table_test;
> {code:java}
> +-------------------------------+----------------------------------------------------+--------------+
> | col_name | data_type | comment |
> +-------------------------------+----------------------------------------------------+--------------+
> | id | int | spark comment |
> | | | |
> | # Detailed Table Information | | |
> | Catalog | spark_catalog | |
> | Database | default | |
> | Table | table_test | |
> | Owner | anonymous | |
> | Created Time | Wed Nov 23 23:06:41 CST 2022 | |
> | Last Access | UNKNOWN | |
> | Created By | Spark 2.2 or prior | |
> | Type | MANAGED | |
> | Provider | hive | |
> | Table Properties | [bucketing_version=2, last_modified_by=anonymous, last_modified_time=1669216665, transactional=true, transactional_properties=default, transient_lastDdlTime=1669216711] | |
> | Location | hdfs://localhost:8020/warehouse/tablespace/managed/hive/table_test | |
> | Serde Library | org.apache.hadoop.hive.ql.io.orc.OrcSerde | |
> | InputFormat | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | |
> | OutputFormat | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | |
> | Storage Properties | [serialization.format=1] | |
> | Partition Provider | Catalog | | {code}
> ---HIVE---
> > alter table table_test change column id id int comment "hive new comment";
>
> ---SPARK---
> > desc formatted table_test;
> {code:java}
> +-------------------------------+----------------------------------------------------+--------------+
> | col_name | data_type | comment |
> +-------------------------------+----------------------------------------------------+--------------+
> | id | int | spark comment |
> | | | |
> | # Detailed Table Information | | |
> | Catalog | spark_catalog | |
> | Database | default | |
> | Table | table_test | |
> | Owner | anonymous | |
> | Created Time | Wed Nov 23 23:06:41 CST 2022 | |
> | Last Access | UNKNOWN | |
> | Created By | Spark 2.2 or prior | |
> | Type | MANAGED | |
> | Provider | hive | |
> | Table Properties | [bucketing_version=2, last_modified_by=anonymous, last_modified_time=1669216736, transactional=true, transactional_properties=default, transient_lastDdlTime=1669216736] | |
> | Location | hdfs://localhost:8020/warehouse/tablespace/managed/hive/table_test | |
> | Serde Library | org.apache.hadoop.hive.ql.io.orc.OrcSerde | |
> | InputFormat | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | |
> | OutputFormat | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | |
> | Storage Properties | [serialization.format=1] | |
> | Partition Provider | Catalog | |
> +-------------------------------+----------------------------------------------------+--------------+ {code}
>
> Alternately modify table field comments with hive and spark,the modified results of hive cannot be queried using spark sql
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org