You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vyacheslav Koptilin (Jira)" <ji...@apache.org> on 2023/05/09 12:17:00 UTC
[jira] [Updated] (IGNITE-19366) Monitoring in AI3
[ https://issues.apache.org/jira/browse/IGNITE-19366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vyacheslav Koptilin updated IGNITE-19366:
-----------------------------------------
Labels: ignite-3 (was: )
> Monitoring in AI3
> -----------------
>
> Key: IGNITE-19366
> URL: https://issues.apache.org/jira/browse/IGNITE-19366
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 3.0
> Reporter: Alexander Belyak
> Priority: Critical
> Labels: ignite-3
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> AI3 needs some monitoring tools ready prior to the first production installation.
> In my opinion, firstly we need to make some documentation with:
> 1) the first set of monitoring tools (enlist each aspect of what should be done)
> 2) high level describe each element and try to mark its difficulty
> 3) split the implementation into phases: must have, should have, nice to have
> From my point of view, the most crucial thing is database locks. AI3 should be able to show what (who and for how long) prevents transaction processing.
> To show it AI3 may provide:
> * a system table/view with all transactions with at least one active lock/lock attempt, its id and id(s) of the tx it's waiting for.
> * ability to log some debug info into the log when a transaction is killed by a deadlock prevention mechanism (not sure if it should be a part of this document)
> The second majority problem is long-running queries.
> To show it AI3 may provide:
> * a system table/view with all running queries/txs with their origin (client/node/username), start time, text, and id.
> * ability to log such queries into the log file (queries that took longer than N ms)
> The others can contain:
> * index usage monitoring
> * memory usage (by tables, indexes, caches, metadata)
> * data integrity (can the user turn off a particular cluster node or not? Was rebalance finished?)
> * per query resource consumption (actual read pages (from dist/mem, globally/locally?), CPU, memory for the caching)
> * node/cluster configuration
> * background processes status (index rebuild, autovacuum, schema changes background processing)
> Mandatory requirement - each option has to have its user documentation (and example of usage?)
> What it should not cover/be:
> * data statistics
> * query plans
> * performance tuning instructions/manuals
> * tuning options to prevent excessive locking/database overloading like time to live, deadlock detection/prevention mechanisms
--
This message was sent by Atlassian Jira
(v8.20.10#820010)