You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2017/10/13 16:56:00 UTC
[jira] [Created] (IMPALA-6052) Improve test data directory
structure
Joe McDonnell created IMPALA-6052:
-------------------------------------
Summary: Improve test data directory structure
Key: IMPALA-6052
URL: https://issues.apache.org/jira/browse/IMPALA-6052
Project: IMPALA
Issue Type: Improvement
Components: Infrastructure
Affects Versions: Impala 2.10.0
Reporter: Joe McDonnell
Dataload generates the hdfs location using this code:
hdfs_location = '{0}.{1}{2}'.format(db_name, table_name, db_suffix)
if data_set in ['hive-benchmark', 'functional']:
hdfs_location = hdfs_location.split('.')[-1]
Where db_suffix is used to describe the compression. Here are some examples:
functional.alltypes is stored in /test-warehouse/alltypes/
functional.alltypesagg is stored in /test-warehouse/alltypesagg/
functional_seq.alltypes is stored in /test-warehouse/alltypes_seq/
functional_seq.alltypesagg is stored in /test-warehouse/alltypesagg_seq/
Tables from the same database are not grouped into a directory. Instead, almost everything in functional is a top level directory. In a normal dataload, hdfs dfs -ls /test-warehouse results in 998 directories. This makes it hard to browse our HDFS directory structure. It also makes it hard to import/export a single database and its tables.
The tables for a database should be in a single directory for that database. The hdfs location should be of the form "${db_name}${db_suffix}.db/${table_name}". functional.alltypes should be in '/test-warehouse/functional.db/alltypes'. The top level directory should end up with about 50 items with the default dataload.
This will require changes in generate_schema_statement.py (in generate_statmments() when generating the hdfs_location). It will also require changes to the schema templates such as testdata/datasets/functional/functional_schema_template.sql. It is also likely to require corresponding test changes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)