You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Юрий <ju...@gmail.com> on 2018/10/02 10:24:08 UTC
Re: Page IO statistics for Ignite

Hi,

Thanks you for participate here.
I've prepared some draft of IEP-27
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-27%3A+Page+IO+statistics>,
which aggregate information from the thread. Please take a look and give
your feedback.

Also there are some open questions:
1) We can simple gathering metrics on each nodes. But also we need cluster
wide metrics. How it can be achieved? I see that we already share some
metrics through TcpDiscoveryMetricsUpdateMessage
and TcpDiscoveryClientMetricsUpdateMessage. If we add all IO metrics here
it will grow up more then two times. Can it lead to performance degradation?
2) How it will be more convenient to reset statistics? My opinion it can be
done by time (for example once per hour) and by request + keep history of
the statistics. What do you think?

Please find below example of node statistics which can be simple gathered
on node. Such statistics can be obtained separately for physical read/write
and logical read:
1. Fine-grained:
{T_PAGE_LIST_NODE=1592, UNKNOWN=0, T_PART_META=865, T_H2_MVCC_REF_LEAF=0,
T_TX_LOG_INNER=0, T_DATA=592, T_DATA_METASTORAGE=0,
T_CACHE_ID_AWARE_PENDING_REF_INNER=0, T_CACHE_ID_DATA_REF_MVCC_LEAF=0,
T_PAGE_LIST_META=365, T_DATA_REF_INNER=0, T_H2_EX_REF_MVCC_INNER=0,
T_PENDING_REF_LEAF=408, T_DATA_REF_METASTORAGE_LEAF=0,
T_DATA_REF_MVCC_INNER=0, T_DATA_REF_METASTORAGE_INNER=0,
T_DATA_REF_MVCC_LEAF=0, T_CACHE_ID_AWARE_DATA_REF_LEAF=0,
T_CACHE_ID_AWARE_PENDING_REF_LEAF=0, T_TX_LOG_LEAF=0, T_DATA_REF_LEAF=1000,
T_H2_EX_REF_MVCC_LEAF=0, T_BPLUS_META=408, T_PAGE_UPDATE_TRACKING=0,
T_H2_REF_LEAF=0, T_METASTORE_INNER=0, T_META=0, T_H2_REF_INNER=0,
T_PART_CNTRS=0, T_H2_EX_REF_INNER=0, T_CACHE_ID_AWARE_DATA_REF_INNER=0,
T_PENDING_REF_INNER=0, T_CACHE_ID_DATA_REF_MVCC_INNER=0,
T_H2_EX_REF_LEAF=0, T_H2_MVCC_REF_INNER=0, T_METASTORE_LEAF=0}
2. Aggregated:
{DATA=592, INDEX=1000, OTHER=3638}


Maybe I missed something important, please share you opinion.


Thanks.


вт, 25 сент. 2018 г. в 22:20, Alex Plehanov <pl...@gmail.com>:

> Hi,
>
> I've made some investigation a couple of months ago about a statistics
> collected by some RDBMS vendors (Oracle, Postgres, MySQL). These databases
> collect detailed IO statistics in dimensions such as queries, database
> objects (tables and indexes), files, sessions, users, event types etc.
>
> Some views where you can get IO statistics:
> Oracle: v$filestat, v$segment_statistics, v$sqlarea, v$sysstat, v$sesstat
> Postgres: pg_stat_database, pg_statio_all_tables, pg_statio_all_indexes,
> pg_statio_all_sequences
> MySQL: table_io_waits_summary_by_table,
> table_io_waits_summary_by_index_usage, io_global_by_file_by_bytes,
> io_global_by_wait_by_bytes, file_summary_by_event_name,
> file_summary_by_instance, host_summary_by_file_io, user_summary_by_file_io,
> metrics, etc.
>
> I think we can start by collecting statistics per FilePageStore (updating
> counters on read(...) and write(...) methods). Each FilePageStore is
> bounded to cache and partition or cache index, so we can easily aggregate
> values and get IO statistics per cache/index/node.
>
> вт, 25 сент. 2018 г. в 10:57, Vladimir Ozerov <vo...@gridgain.com>:
>
>> Hi Yuriy,
>>
>> I think this is great idea. But we need to collect more details on how and
>> what to collect. I think one of the most interesting parts for us would be
>> index and data page usages, split by different "dimensions":
>> 1) Global node statistics
>> 2) Per-cache statistics
>> 3) Per-index statistics
>>
>> We can start with a short summary of what is collected by other database
>> vendors.
>>
>> Vladimir.
>>
>>
>> On Sat, Sep 22, 2018 at 1:07 AM Denis Magda <dm...@apache.org> wrote:
>>
>> > Hello Yuri,
>> >
>> > I might give useful feedback if see how the metrics will look like from
>> the
>> > API standpoint. If it's not difficult please create a draft.
>> >
>> > AS for the interface, in addition to JMX and SQL we need to ensure Visor
>> > CMD and Web Console gets updated. *Alex K.*, please join the thread and
>> > share your requirements.
>> >
>> > --
>> > Denis
>> >
>> > On Fri, Sep 21, 2018 at 8:16 AM Юрий <ju...@gmail.com>
>> wrote:
>> >
>> > > Hi Igniters,
>> > >
>> > > I started IGNITE-8580
>> > > <https://issues.apache.org/jira/browse/IGNITE-8580> ticket
>> > > related to print page read/write metrics and did some investigation
>> what
>> > > other databases provide for the similar purposes.
>> > >
>> > > Based on the investigation I want to propose my raw vision of how to
>> > IGNITE
>> > > can be more transparent from performance perspective.
>> > >
>> > > Need to collect statistics for logical (from memory) and physical
>> (from
>> > > storage) page reads/writes. All these metrics should be separated by
>> next
>> > > dimensions:
>> > > 1) index/cache
>> > > 2) query level
>> > > 3) node/cluster
>> > > ...
>> > >
>> > > Seems the statistics should be limited by time.
>> > >
>> > > If we will have such statistics we could realize such things as:
>> > > 1) Get IO statistics per SQL query, global or/and splitted by
>> > > indexes/caches.
>> > > 2) Have ability to understand why performance goes down in case it
>> > related
>> > > to IO. For example on concrete node or cache.
>> > > 3) Evaluate effectiveness of use indexes. Find unused indexes.
>> > > 4) Keep TOP queries with aggressive physical reads
>> > > ....
>> > >
>> > >
>> > > Such statistics could be available least at JMX and SQL interfaces.
>> > >
>> > > Let's discuss. In case it will be interested for you I can dig deeper
>> > into
>> > > the area and prepare IEP based on our discussion.
>> > >
>> > >
>> > > Igniters, what do you think?
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Живи с улыбкой! :D
>> > >
>> >
>>
>

-- 
Живи с улыбкой! :D