You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vyacheslav Koptilin (Jira)" <ji...@apache.org> on 2023/05/09 12:17:00 UTC

[jira] [Updated] (IGNITE-19366) Monitoring in AI3

     [ https://issues.apache.org/jira/browse/IGNITE-19366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vyacheslav Koptilin updated IGNITE-19366:
-----------------------------------------
    Labels: ignite-3  (was: )

> Monitoring in AI3
> -----------------
>
>                 Key: IGNITE-19366
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19366
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Alexander Belyak
>            Priority: Critical
>              Labels: ignite-3
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> AI3 needs some monitoring tools ready prior to the first production installation.
> In my opinion, firstly we need to make some documentation with:
> 1) the first set of monitoring tools (enlist each aspect of what should be done)
> 2) high level describe each element and try to mark its difficulty
> 3) split the implementation into phases: must have, should have, nice to have
> From my point of view, the most crucial thing is database locks. AI3 should be able to show what (who and for how long) prevents transaction processing. 
> To show it AI3 may provide:
>  * a system table/view with all transactions with at least one active lock/lock attempt, its id and id(s) of the tx it's waiting for.
>  * ability to log some debug info into the log when a transaction is killed by a deadlock prevention mechanism (not sure if it should be a part of this document)
> The second majority problem is long-running queries.
> To show it AI3 may provide:
>  * a system table/view with all running queries/txs with their origin (client/node/username), start time, text, and id.
>  * ability to log such queries into the log file (queries that took longer than N ms)
> The others can contain:
>  * index usage monitoring
>  * memory usage (by tables, indexes, caches, metadata)
>  * data integrity (can the user turn off a particular cluster node or not? Was rebalance finished?)
>  * per query resource consumption (actual read pages (from dist/mem, globally/locally?), CPU, memory for the caching)
>  * node/cluster configuration
>  * background processes status (index rebuild, autovacuum, schema changes background processing)
> Mandatory requirement - each option has to have its user documentation (and example of usage?)
> What it should not cover/be:
>  * data statistics
>  * query plans
>  * performance tuning instructions/manuals
>  * tuning options to prevent excessive locking/database overloading like time to live, deadlock detection/prevention mechanisms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)