You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2018/12/12 01:24:00 UTC

[jira] [Updated] (HIVE-16957) Support CTAS for auto gather column stats

     [ https://issues.apache.org/jira/browse/HIVE-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesus Camacho Rodriguez updated HIVE-16957:
-------------------------------------------
    Description: 
The idea is to rely as much as possible on the logic in ColumnStatsSemanticAnalyzer as other operations do. In particular, they create a 'analyze table t compute statistics for columns', use ColumnStatsSemanticAnalyzer to parse it, and connect resulting plan to existing INSERT/INSERT OVERWRITE statement. The challenge for CTAS or CREATE MATERIALIZED VIEW is that the table object does not exist yet, hence we cannot rely fully on ColumnStatsSemanticAnalyzer.

Thus, we use same process, but ColumnStatsSemanticAnalyzer produces a statement for column stats collection that uses a table values clause instead of the original table reference:
{code}
select compute_stats(col1), compute_stats(col2), compute_stats(col3)
from table(values(cast(null as int), cast(null as int), cast(null as string))) as t(col1, col2, col3);
{code}

> Support CTAS for auto gather column stats
> -----------------------------------------
>
>                 Key: HIVE-16957
>                 URL: https://issues.apache.org/jira/browse/HIVE-16957
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>
> The idea is to rely as much as possible on the logic in ColumnStatsSemanticAnalyzer as other operations do. In particular, they create a 'analyze table t compute statistics for columns', use ColumnStatsSemanticAnalyzer to parse it, and connect resulting plan to existing INSERT/INSERT OVERWRITE statement. The challenge for CTAS or CREATE MATERIALIZED VIEW is that the table object does not exist yet, hence we cannot rely fully on ColumnStatsSemanticAnalyzer.
> Thus, we use same process, but ColumnStatsSemanticAnalyzer produces a statement for column stats collection that uses a table values clause instead of the original table reference:
> {code}
> select compute_stats(col1), compute_stats(col2), compute_stats(col3)
> from table(values(cast(null as int), cast(null as int), cast(null as string))) as t(col1, col2, col3);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)