You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/03/19 03:29:00 UTC

[jira] [Commented] (IMPALA-8125) Limit number of files generated by unpartitioned insert

    [ https://issues.apache.org/jira/browse/IMPALA-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795614#comment-16795614 ] 

ASF subversion and git services commented on IMPALA-8125:
---------------------------------------------------------

Commit 9090fc239b62c0d698c6c256c20d86b69f8cc64f in impala's branch refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9090fc2 ]

IMPALA-8097: mt_dop for all queries via hidden flag

--unlock_mt_dop=true unlocks mt_dop for all queries
including joins and inserts.

This disables the parallel plans with separate join builds
when running standalone, because these are not executable
until IMPALA-4224 is implemented. Inserts work without
modification - they were disabled because of lack of
testing and the possibility for generating many small
files with unpartitioned inserts - see IMPALA-8125.

Testing:
Add custom cluster test that exercise joins, runtime filters
and inserts as a sanity check for the flag.

Ran exhaustive build.

Manually ran TPC-H and TPC-DS tests against a minicluster
with mt_dop = 4.

Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Reviewed-on: http://gerrit.cloudera.org:8080/12257
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Limit number of files generated by unpartitioned insert
> -------------------------------------------------------
>
>                 Key: IMPALA-8125
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8125
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Priority: Major
>
> One pitfall of multithreaded execution is that, if implemented naively, the number of files generated by an unpartitioned insert will be multiplied by mt_dop.
> We should provide a mechanism to limit the number of files generated, e.g. limit the number of insert fragment instances (note that there a pre-existing problem with unpartitioned inserts generating too many files).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org