You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2017/10/17 20:03:00 UTC

[jira] [Created] (IMPALA-6068) Dataload does not populate functional_*.complextypes_fileformat correctly

Joe McDonnell created IMPALA-6068:
-------------------------------------

             Summary: Dataload does not populate functional_*.complextypes_fileformat correctly
                 Key: IMPALA-6068
                 URL: https://issues.apache.org/jira/browse/IMPALA-6068
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 2.10.0
            Reporter: Joe McDonnell
            Priority: Critical


functional.complextypes_fileformat is a text table containing some nested data.

Data load is supposed to generate functional.complextypes_fileformat in this order:

1. Create table functional.complextypes_fileformat
2. Populate functional.complextypes_fileformat using

INSERT OVERWRITE TABLE {db_name}{db_suffix}.{table_name} SELECT id, named_struct("f1",string_col,"f2",int_col), array(1, 2, 3), map("k", cast(0 as bigint)) FROM functional.alltypestiny;

3. Create tables functional_*.complextypes_fileformat
4. Populate each table using:
INSERT OVERWRITE TABLE {table_name} SELECT * FROM functional.{table_name};

However, dataload is doing this in the wrong order. It does #1, #3, #4, and then finally #2. This means that #4 is operating on zero rows, so all the functional_*.complextypes_fileformat tables have zero rows. Oddly enough, dataload also generates #4 to insert into functional.complextypes_fileformat so it is overwriting itself using rows from itself. Dataload should do this in the correct order (and avoid this weirdness). 

This is only used for frontend tests, but it can cause issues with recent versions of Hive, because Hive seems to skip creating a file when it would be writing zero rows. That can alter the number of files listed in the plan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)