You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/05/24 21:10:00 UTC

[jira] [Commented] (IMPALA-11306) single_node_perf_run.py fail to load dataset if scale factor is 1

    [ https://issues.apache.org/jira/browse/IMPALA-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541712#comment-17541712 ] 

ASF subversion and git services commented on IMPALA-11306:
----------------------------------------------------------

Commit ad915ca58eaa004925d545057e9ebdba5d62131b in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ad915ca58 ]

IMPALA-11306: Create symlink for dataset of scale factor 1

single_node_perf_run.py and load-data.py can fail if user set scale
factor argument 1. This is because generate-schema-statements.py will
insert the scale factor into the database name (ie., "tpch1"), but the
preload script omit the scale factor when creating dataset
directory (ie., "tpch"). This patch fix the issue by additionally
creating symlink for scale factor 1.

Testing:
- Manual test by running the following script:
  ./bin/load-data.py --scale_factor=1 --workloads=targeted-perf \
    --table_formats=text/none/none

Change-Id: I76c9c90b243df6213626e11652cfed59643aed2c
Reviewed-on: http://gerrit.cloudera.org:8080/18545
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> single_node_perf_run.py fail to load dataset if scale factor is 1
> -----------------------------------------------------------------
>
>                 Key: IMPALA-11306
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11306
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0.0
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Minor
>              Labels: ramp-up
>
> single_node_perf_run.py has a required argument "scale". If scale > 1, the script runs fine. But if scale = 1 and load is true, the data loading script will fail due to missing dataset. This is becasue the preload script omit the scale number padding when creating dataset directory.
> [https://github.com/apache/impala/blob/6ea15409b879a1286e72848defdda8d5d8568c19/testdata/datasets/tpch/preload#L27]
> ie., tpch scale 1 will create dataset dir "testdata/impala-data/tpch".
> On the other hand, generate-schema-statements.py will create template sql referring to "testdata/impala-data/tpch1".
> [https://github.com/apache/impala/blob/6ea15409b879a1286e72848defdda8d5d8568c19/testdata/bin/generate-schema-statements.py#L599] 
> Consider creating symlink if scale factor = 1 in the preload script.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org