You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Belyak (Jira)" <ji...@apache.org> on 2023/04/26 06:47:00 UTC

[jira] [Created] (IGNITE-19366) Monitoring in AI3

Alexander Belyak created IGNITE-19366:
-----------------------------------------

             Summary: Monitoring in AI3
                 Key: IGNITE-19366
                 URL: https://issues.apache.org/jira/browse/IGNITE-19366
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 3.0
            Reporter: Alexander Belyak


AI3 needs some monitoring tools ready prior to the first production installation.

In my opinion, firstly we need to make some documentation with:

1) the first set of monitoring tools (enlist each aspect of what should be done)

2) high level describe each element and try to mark its difficulty

3) split the implementation into phases: must have, should have, nice to have

From my point of view, the most crucial thing is database locks. AI3 should be able to show what (who and for how long) prevents transaction processing. 

To show it AI3 may provide:
 * a system table/view with all transactions with at least one active lock/lock attempt, its id and id(s) of the tx it's waiting for.
 * ability to log some debug info into the log when a transaction is killed by a deadlock prevention mechanism (not sure if it should be a part of this document)

The second majority problem is long-running queries.

To show it AI3 may provide:
 * a system table/view with all running queries/txs with their origin (client/node/username), start time, text, and id.
 * ability to log such queries into the log file (queries that took longer than N ms)

The others can contain:
 * index usage monitoring
 * memory usage (by tables, indexes, caches, metadata)
 * data integrity (can the user turn off a particular cluster node or not? Was rebalance finished?)
 * per query resource consumption (actual read pages (from dist/mem, globally/locally?), CPU, memory for the caching)
 * node/cluster configuration
 * background processes status (index rebuild, autovacuum, schema changes background processing)

Mandatory requirement - each option has to have its user documentation (and example of usage?)

What it should not cover/be:
 * data statistics
 * query plans
 * performance tuning instructions/manuals
 * tuning options to prevent excessive locking/database overloading like time to live, deadlock detection/prevention mechanisms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)