You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2017/10/13 16:56:00 UTC

[jira] [Created] (IMPALA-6052) Improve test data directory structure

Joe McDonnell created IMPALA-6052:
-------------------------------------

             Summary: Improve test data directory structure
                 Key: IMPALA-6052
                 URL: https://issues.apache.org/jira/browse/IMPALA-6052
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
    Affects Versions: Impala 2.10.0
            Reporter: Joe McDonnell


Dataload generates the hdfs location using this code:
hdfs_location = '{0}.{1}{2}'.format(db_name, table_name, db_suffix)
if data_set in ['hive-benchmark', 'functional']:
  hdfs_location = hdfs_location.split('.')[-1]
Where db_suffix is used to describe the compression. Here are some examples:
functional.alltypes is stored in /test-warehouse/alltypes/
functional.alltypesagg is stored in /test-warehouse/alltypesagg/
functional_seq.alltypes is stored in /test-warehouse/alltypes_seq/
functional_seq.alltypesagg is stored in /test-warehouse/alltypesagg_seq/
Tables from the same database are not grouped into a directory. Instead, almost everything in functional is a top level directory. In a normal dataload, hdfs dfs -ls /test-warehouse results in 998 directories. This makes it hard to browse our HDFS directory structure. It also makes it hard to import/export a single database and its tables.

The tables for a database should be in a single directory for that database. The hdfs location should be of the form "${db_name}${db_suffix}.db/${table_name}". functional.alltypes should be in '/test-warehouse/functional.db/alltypes'. The top level directory should end up with about 50 items with the default dataload.

This will require changes in generate_schema_statement.py (in generate_statmments() when generating the hdfs_location). It will also require changes to the schema templates such as testdata/datasets/functional/functional_schema_template.sql. It is also likely to require corresponding test changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)