You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/06/08 03:02:00 UTC

[jira] [Commented] (IMPALA-9822) Impala does not notify user that row format delimited fields is only logical when using STORED AS TEXTFILE

    [ https://issues.apache.org/jira/browse/IMPALA-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359013#comment-17359013 ] 

Quanlong Huang commented on IMPALA-9822:
----------------------------------------

Assigning this to [~shikha.asrani] who recently has effort on it.

We can reproduce the issue using the data set of Impala repo:
{code:bash}
[localhost:21050] default> create external table my_alltypestiny_tbl(id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_string_col STRING, string_col STRING, timestamp_col TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET location 'hdfs://localhost:20500/test-warehouse/alltypestiny/year=2009/month=1';
Query: create external table my_alltypestiny_tbl(id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_string_col STRING, string_col STRING, timestamp_col TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET location 'hdfs://localhost:20500/test-warehouse/alltypestiny/year=2009/month=1'
+-------------------------+
| summary                 |
+-------------------------+
| Table has been created. |
+-------------------------+
Fetched 1 row(s) in 0.08s
[localhost:21050] default> select count(*) from my_alltypestiny_tbl;
Query: select count(*) from my_alltypestiny_tbl
Query submitted at: 2021-06-08 10:57:09 (Coordinator: http://quanlong-OptiPlex-BJ:25000)
Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a14496cb6338af95:814c1d2f00000000
ERROR: File 'hdfs://localhost:20500/test-warehouse/alltypestiny/year=2009/month=1/090101.txt' has an invalid Parquet version number: 30 2e 30 a .
Please check that it is a valid Parquet file. This error can also occur due to stale metadata. If you believe this is a valid Parquet file, try running "refresh default.my_alltypestiny_tbl".
{code}

We need a warning in the CreateTable statement that the row format, i.e. "{{ROW FORMAT DELIMITED FIELDS TERMINATED BY ','}}", is actually ignored.

> Impala does not notify user that row format delimited fields is only logical when using STORED AS TEXTFILE
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-9822
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9822
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Alexandra Dunai
>            Assignee: Shikha Asrani
>            Priority: Minor
>              Labels: newbie, ramp-up, usability
>
> When creating a table with added "ROW FORMAT DELIMITED FIELDS", Impala does not alert the user that this is only logical when using STORED AS TEXTFILE.
> You only discover that you made a mistake after trying to run a select from the table.
>  Table creation:
> {code:bash}
> [adunai-1.adunai.root.hwx.site:21000] default> CREATE EXTERNAL TABLE sales_fact_1997(product_id INT,time_id INT,customer_id INT,promotion_id INT,store_id INT,store_sales DECIMAL(10,4),store_cost DECIMAL(10,4),unit_sales DECIMAL(10,4))
>  > row format delimited fields terminated by '\011' STORED AS PARQUET
>  > location '/user/impala/mondrian/sales_fact_1997';
> Query: CREATE EXTERNAL TABLE sales_fact_1997(product_id INT,time_id INT,customer_id INT,promotion_id INT,store_id INT,store_sales DECIMAL(10,4),store_cost DECIMAL(10,4),unit_sales DECIMAL(10,4))row format delimited fields terminated by '\011' STORED AS PARQUET location '/user/impala/mondrian/sales_fact_1997'
>  
> +-------------------------+
> | summary |
> +-------------------------+
> | Table has been created. |
> +-------------------------+
> Fetched 1 row(s) in 0.10s
> {code}
>  
> Select: 
> {code:bash}
> [adunai-1.adunai.root.hwx.site:21000] mondrian> select count(*) from agg_c_10_sales_fact_1997;
> Query: select count(*) from agg_c_10_sales_fact_1997
> Query submitted at: 2020-06-03 11:55:06 (Coordinator: http://adunai-1.adunai.root.hwx.site:25000)
> Query progress can be monitored at: http://adunai-1.adunai.root.hwx.site:25000/query_plan?query_id=d547fafd0162da4e:872a95c100000000
> ERROR: File 'hdfs://adunai-2.adunai.root.hwx.site:8020/user/impala/mondrian/agg_c_10_sales_fact_1997/agg_c_10_sales_fact_1997.tsv' has an invalid Parquet version number: 717. Please check that it is a valid Parquet file. This error can also occur due to stale metadata. If you believe this is a valid Parquet file, try running "refresh mondrian.agg_c_10_sales_fact_1997".{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org