You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2017/10/17 20:03:00 UTC
[jira] [Created] (IMPALA-6068) Dataload does not populate
functional_*.complextypes_fileformat correctly
Joe McDonnell created IMPALA-6068:
-------------------------------------
Summary: Dataload does not populate functional_*.complextypes_fileformat correctly
Key: IMPALA-6068
URL: https://issues.apache.org/jira/browse/IMPALA-6068
Project: IMPALA
Issue Type: Bug
Components: Infrastructure
Affects Versions: Impala 2.10.0
Reporter: Joe McDonnell
Priority: Critical
functional.complextypes_fileformat is a text table containing some nested data.
Data load is supposed to generate functional.complextypes_fileformat in this order:
1. Create table functional.complextypes_fileformat
2. Populate functional.complextypes_fileformat using
INSERT OVERWRITE TABLE {db_name}{db_suffix}.{table_name} SELECT id, named_struct("f1",string_col,"f2",int_col), array(1, 2, 3), map("k", cast(0 as bigint)) FROM functional.alltypestiny;
3. Create tables functional_*.complextypes_fileformat
4. Populate each table using:
INSERT OVERWRITE TABLE {table_name} SELECT * FROM functional.{table_name};
However, dataload is doing this in the wrong order. It does #1, #3, #4, and then finally #2. This means that #4 is operating on zero rows, so all the functional_*.complextypes_fileformat tables have zero rows. Oddly enough, dataload also generates #4 to insert into functional.complextypes_fileformat so it is overwriting itself using rows from itself. Dataload should do this in the correct order (and avoid this weirdness).
This is only used for frontend tests, but it can cause issues with recent versions of Hive, because Hive seems to skip creating a file when it would be writing zero rows. That can alter the number of files listed in the plan.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)