You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Miklos Szurap (Jira)" <ji...@apache.org> on 2022/03/30 09:58:00 UTC

[jira] [Created] (IMPALA-11209) Inconsistent results querying tables with subdirectories

Miklos Szurap created IMPALA-11209:
--------------------------------------

             Summary: Inconsistent results querying tables with subdirectories
                 Key: IMPALA-11209
                 URL: https://issues.apache.org/jira/browse/IMPALA-11209
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog, Frontend
            Reporter: Miklos Szurap


IMPALA-8454 introduced the recursive listing of table/partition directories. It seems that it is not properly handling if we would like to intentionally disable this new behavior through the impala.disable.recursive.listing=true table property. Within the same session a refresh statement on the table flaps the behavior, see below reproduction steps.
{code}
CREATE EXTERNAL TABLE subdirtest (col1 string) partitioned by (p1 string) TBLPROPERTIES ('impala.disable.recursive.listing'='true');

ALTER TABLE subdirtest ADD PARTITION (p1='A');
{code}
then ingest some files into subdirectories
{code}
hdfs dfs -mkdir /warehouse/tablespace/external/hive/subdirtest/p1=A/00
hdfs dfs -put testdata.parq /warehouse/tablespace/external/hive/subdirtest/p1=A/00/
{code}
The "testdata.parq" matches the schema, and has two rows/records.
{code}
[coordinator.example.com:21050] default> refresh subdirtest;
...
[coordinator.example.com:21050] default> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 0        |
+----------+

[coordinator.example.com:21050] default> refresh subdirtest;
...
[coordinator.example.com:21050] default> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 2        |
+----------+

[coordinator.example.com:21050] default> refresh subdirtest;
...
[coordinator.example.com:21050] default> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 0        |
+----------+

[coordinator.example.com:21050] default> refresh subdirtest;
...
[coordinator.example.com:21050] default> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 2        |
+----------+
{code}
This can be reproduced within the same / single impala-shell session (without any other coordinators or load-balancing).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org