You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/05 22:53:00 UTC

[jira] [Commented] (IMPALA-7425) Add option to load incremental statistics from catalog

    [ https://issues.apache.org/jira/browse/IMPALA-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605040#comment-16605040 ] 

ASF subversion and git services commented on IMPALA-7425:
---------------------------------------------------------

Commit 72ee4a42753cfd8703ab5935dd61fcc8ad5fb7e1 in impala's branch refs/heads/master from [~vercego]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=72ee4a4 ]

IMPALA-7425: Change incremental stats to pull from catalogd.

Currently, incremental stats can consume a substantial
amount of metadata memory (per table, partition, column).
This metadata is transmitted from catalogd to all coordinators.
As a result, memory is used for all loaded tables that use
incremental stats all the time at all coordinators. A consequence
is that coordinators and catalogd die from OOM more often
when incremental stats are used and more network bandwidth is used.

This change removes incremental stats from impalads. These stats
are only needed when computing incremental statistics and merging
new results with the existing results. They are not used by queries.
As a result, the change requires that coordinators fetch
incremental stats directly from catalogd when computing incremental stats.
In addition, catalogd no longer sends incremental stats to coordinators
via the statestore.

The option is enabled by setting a new flag, --pull_incremental_statistics,
on the catalogd and all impalad coordinators.

Testing:
  - manual testing
  - added end-to-end tests with --pull_incremental_statistics enabled
    for the compute-stats-incremental.test
  - added fe CatalogTest for new catalogd service method
  - passes exhaustive tests when --pull_incremental_statistics is enabled
    and disabled

Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e
Reviewed-on: http://gerrit.cloudera.org:8080/11193
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add option to load incremental statistics from catalog
> ------------------------------------------------------
>
>                 Key: IMPALA-7425
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7425
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>    Affects Versions: Impala 3.1.0
>            Reporter: Vuk Ercegovac
>            Assignee: Vuk Ercegovac
>            Priority: Major
>
> Incremental statistics currently store all required data in catalogd and all impalad coordinators. However, this data is only required when computing incremental statistics. In cases where incremental statistics is used on many partition columns (due to tables with many columns, many partitions or both), this data can dominate the overall memory footprint. This can lead to OOM's, increased network usage, and instability.
> Add an option to avoid propagating incremental stats to all coordinators and instead, pull it on demand from the catalog only when needed by the compute incremental statistics statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org