You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/08/03 01:12:00 UTC

[jira] [Assigned] (IMPALA-6536) CREATE TABLE on S3 takes a very long time

     [ https://issues.apache.org/jira/browse/IMPALA-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong reassigned IMPALA-6536:
-------------------------------------

    Assignee: Todd Lipcon  (was: Alexander Behm)

I was looking through old CRs and saw Alex left this: https://gerrit.cloudera.org/#/c/10176/

Not sure if it's still interesting or not.

> CREATE TABLE on S3 takes a very long time
> -----------------------------------------
>
>                 Key: IMPALA-6536
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6536
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>            Reporter: Alexander Behm
>            Assignee: Todd Lipcon
>            Priority: Critical
>              Labels: catalog, perfomance, s3
>
> *Summary*
> Creating a table that points to existing data in S3 can take an excessive amount of time.
> *Reason*
> If the Hive Metastore is configured with "hive.stats.autogather=true" then Hive lists the files of newly created tables to populate basic statistics like file count and file byte sizes. Unfortunately, this listing operation can take an excessive amount of time particularly on S3.
> *Workaround*
> * Reconfigure the Hive Metastore with "hive.stats.autogather=false"
> * Note that TBLPROPERTIES("DO_NOT_UPDATE_STATS"="true") does not address the issue due to a bug in Hive
> Related:
> https://issues.apache.org/jira/browse/HIVE-18743
> *Example*
> {code}
> CREATE EXTERNAL TABLE tpch_lineitem_s3 (
>   l_orderkey BIGINT,
>   l_partkey BIGINT,
>   l_suppkey BIGINT,
>   l_linenumber BIGINT,
>   l_quantity DECIMAL(12,2),
>   l_extendedprice DECIMAL(12,2),
>   l_discount DECIMAL(12,2),
>   l_tax DECIMAL(12,2),
>   l_returnflag STRING,
>   l_linestatus STRING,
>   l_shipdate STRING,
>   l_commitdate STRING,
>   l_receiptdate STRING,
>   l_shipinstruct STRING,
>   l_shipmode STRING,
>   l_comment STRING
> )
> STORED AS PARQUET
> LOCATION "s3a://some_location/my_existing_data"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org