You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/07/23 00:15:00 UTC

[jira] [Commented] (IMPALA-4746) num_nodes should take any value and use that many nodes

    [ https://issues.apache.org/jira/browse/IMPALA-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163144#comment-17163144 ] 

Tim Armstrong commented on IMPALA-4746:
---------------------------------------

IMPALA-8125 implements this for write fragments only - that allows controlling # of small files without affecting the rest of the query plan.

> num_nodes should take any value and use that many nodes
> -------------------------------------------------------
>
>                 Key: IMPALA-4746
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4746
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Peter Ebert
>            Priority: Major
>
> Ideally num_nodes should be expanded to allow any # of nodes (up to total number of Impalad's in the cluster).  This would randomly select nodes to run fragments on, prioritizing data locallity (where possible).  Locality gets more complex if there are multiple scans, perhaps defaulting to the largest table might work best
> This would conceivably be helpful in several scenarios
> * Scale performance testing: could run a query with num_nodes = 10 vs 20 to get a rough idea of scaling performance.
> * Avoid writing small files: if doing an insert on larger clusters (200+), even fairly large data sets can be spread out too thin.  Setting # of nodes would allow you to control small files which then simplifies read query plans and # of fragments.
> * Control cluster utilization: some queries may perform just as well with less nodes, particularly where there are broadcast joins where data would be sent over the network anyways.  There may be some multi-tenancy improvements by restricting the number of nodes some queries can use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org