You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/02/19 01:30:00 UTC

[jira] [Commented] (IMPALA-11124) testdata loading should reuse TPCH/TPCDS local data if they exist

    [ https://issues.apache.org/jira/browse/IMPALA-11124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494862#comment-17494862 ] 

ASF subversion and git services commented on IMPALA-11124:
----------------------------------------------------------

Commit 1697af02d6c96b82f19cc75235719afcb864ebe2 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1697af0 ]

IMPALA-11124: Reuse local TPCH/TPCDS data in testdata loading

When loading testdata for TPC-H/TPC-DS, we first run a preload script to
generate local data, and then upload them to HDFS to be used by Hive.
The preload script currently always generates the data, which is
time-consuming in large scale factors.

This patch modifies the preload scripts to check if the last run
succeeded, and reuse the data if it does. Otherwise, generate the data
and leave a success marker in the data directory.

Tests:
 - Verified the scripts locally.

Change-Id: Ied40e599cda009ae0ad88ad13385e7bb86428bb4
Reviewed-on: http://gerrit.cloudera.org:8080/18233
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> testdata loading should reuse TPCH/TPCDS local data if they exist
> -----------------------------------------------------------------
>
>                 Key: IMPALA-11124
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11124
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> When loading testdata for TPC-H/TPC-DS, we first run a preload script to generate local data, and then upload them to HDFS to be used by Hive. It's time-consuming to run the preload script in large scale factors (e.g. 30). We should reuse them if they exist.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org