You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Юрий <ju...@gmail.com> on 2018/09/21 12:16:02 UTC

Page IO statistics for Ignite

Hi Igniters,

I started IGNITE-8580
<https://issues.apache.org/jira/browse/IGNITE-8580> ticket
related to print page read/write metrics and did some investigation what
other databases provide for the similar purposes.

Based on the investigation I want to propose my raw vision of how to IGNITE
can be more transparent from performance perspective.

Need to collect statistics for logical (from memory) and physical (from
storage) page reads/writes. All these metrics should be separated by next
dimensions:
1) index/cache
2) query level
3) node/cluster
...

Seems the statistics should be limited by time.

If we will have such statistics we could realize such things as:
1) Get IO statistics per SQL query, global or/and splitted by
indexes/caches.
2) Have ability to understand why performance goes down in case it related
to IO. For example on concrete node or cache.
3) Evaluate effectiveness of use indexes. Find unused indexes.
4) Keep TOP queries with aggressive physical reads
....


Such statistics could be available least at JMX and SQL interfaces.

Let's discuss. In case it will be interested for you I can dig deeper into
the area and prepare IEP based on our discussion.


Igniters, what do you think?




-- 
Живи с улыбкой! :D

Re: Page IO statistics for Ignite

Posted by Юрий <ju...@gmail.com>.

Hi,

Thanks you for participate here.
I've prepared some draft of IEP-27
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-27%3A+Page+IO+statistics>,
which aggregate information from the thread. Please take a look and give
your feedback.

Also there are some open questions:
1) We can simple gathering metrics on each nodes. But also we need cluster
wide metrics. How it can be achieved? I see that we already share some
metrics through TcpDiscoveryMetricsUpdateMessage
and TcpDiscoveryClientMetricsUpdateMessage. If we add all IO metrics here
it will grow up more then two times. Can it lead to performance degradation?
2) How it will be more convenient to reset statistics? My opinion it can be
done by time (for example once per hour) and by request + keep history of
the statistics. What do you think?

Please find below example of node statistics which can be simple gathered
on node. Such statistics can be obtained separately for physical read/write
and logical read:
1. Fine-grained:
{T_PAGE_LIST_NODE=1592, UNKNOWN=0, T_PART_META=865, T_H2_MVCC_REF_LEAF=0,
T_TX_LOG_INNER=0, T_DATA=592, T_DATA_METASTORAGE=0,
T_CACHE_ID_AWARE_PENDING_REF_INNER=0, T_CACHE_ID_DATA_REF_MVCC_LEAF=0,
T_PAGE_LIST_META=365, T_DATA_REF_INNER=0, T_H2_EX_REF_MVCC_INNER=0,
T_PENDING_REF_LEAF=408, T_DATA_REF_METASTORAGE_LEAF=0,
T_DATA_REF_MVCC_INNER=0, T_DATA_REF_METASTORAGE_INNER=0,
T_DATA_REF_MVCC_LEAF=0, T_CACHE_ID_AWARE_DATA_REF_LEAF=0,
T_CACHE_ID_AWARE_PENDING_REF_LEAF=0, T_TX_LOG_LEAF=0, T_DATA_REF_LEAF=1000,
T_H2_EX_REF_MVCC_LEAF=0, T_BPLUS_META=408, T_PAGE_UPDATE_TRACKING=0,
T_H2_REF_LEAF=0, T_METASTORE_INNER=0, T_META=0, T_H2_REF_INNER=0,
T_PART_CNTRS=0, T_H2_EX_REF_INNER=0, T_CACHE_ID_AWARE_DATA_REF_INNER=0,
T_PENDING_REF_INNER=0, T_CACHE_ID_DATA_REF_MVCC_INNER=0,
T_H2_EX_REF_LEAF=0, T_H2_MVCC_REF_INNER=0, T_METASTORE_LEAF=0}
2. Aggregated:
{DATA=592, INDEX=1000, OTHER=3638}

Maybe I missed something important, please share you opinion.

Thanks.

вт, 25 сент. 2018 г. в 22:20, Alex Plehanov <pl...@gmail.com>:

> Hi,
>
> I've made some investigation a couple of months ago about a statistics
> collected by some RDBMS vendors (Oracle, Postgres, MySQL). These databases
> collect detailed IO statistics in dimensions such as queries, database
> objects (tables and indexes), files, sessions, users, event types etc.
>
> Some views where you can get IO statistics:
> Oracle: v$filestat, v$segment_statistics, v$sqlarea, v$sysstat, v$sesstat
> Postgres: pg_stat_database, pg_statio_all_tables, pg_statio_all_indexes,
> pg_statio_all_sequences
> MySQL: table_io_waits_summary_by_table,
> table_io_waits_summary_by_index_usage, io_global_by_file_by_bytes,
> io_global_by_wait_by_bytes, file_summary_by_event_name,
> file_summary_by_instance, host_summary_by_file_io, user_summary_by_file_io,
> metrics, etc.
>
> I think we can start by collecting statistics per FilePageStore (updating
> counters on read(...) and write(...) methods). Each FilePageStore is
> bounded to cache and partition or cache index, so we can easily aggregate
> values and get IO statistics per cache/index/node.
>
> вт, 25 сент. 2018 г. в 10:57, Vladimir Ozerov <vo...@gridgain.com>:
>
>> Hi Yuriy,
>>
>> I think this is great idea. But we need to collect more details on how and
>> what to collect. I think one of the most interesting parts for us would be
>> index and data page usages, split by different "dimensions":
>> 1) Global node statistics
>> 2) Per-cache statistics
>> 3) Per-index statistics
>>
>> We can start with a short summary of what is collected by other database
>> vendors.
>>
>> Vladimir.
>>
>>
>> On Sat, Sep 22, 2018 at 1:07 AM Denis Magda <dm...@apache.org> wrote:
>>
>> > Hello Yuri,
>> >
>> > I might give useful feedback if see how the metrics will look like from
>> the
>> > API standpoint. If it's not difficult please create a draft.
>> >
>> > AS for the interface, in addition to JMX and SQL we need to ensure Visor
>> > CMD and Web Console gets updated. *Alex K.*, please join the thread and
>> > share your requirements.
>> >
>> > --
>> > Denis
>> >
>> > On Fri, Sep 21, 2018 at 8:16 AM Юрий <ju...@gmail.com>
>> wrote:
>> >
>> > > Hi Igniters,
>> > >
>> > > I started IGNITE-8580
>> > > <https://issues.apache.org/jira/browse/IGNITE-8580> ticket
>> > > related to print page read/write metrics and did some investigation
>> what
>> > > other databases provide for the similar purposes.
>> > >
>> > > Based on the investigation I want to propose my raw vision of how to
>> > IGNITE
>> > > can be more transparent from performance perspective.
>> > >
>> > > Need to collect statistics for logical (from memory) and physical
>> (from
>> > > storage) page reads/writes. All these metrics should be separated by
>> next
>> > > dimensions:
>> > > 1) index/cache
>> > > 2) query level
>> > > 3) node/cluster
>> > > ...
>> > >
>> > > Seems the statistics should be limited by time.
>> > >
>> > > If we will have such statistics we could realize such things as:
>> > > 1) Get IO statistics per SQL query, global or/and splitted by
>> > > indexes/caches.
>> > > 2) Have ability to understand why performance goes down in case it
>> > related
>> > > to IO. For example on concrete node or cache.
>> > > 3) Evaluate effectiveness of use indexes. Find unused indexes.
>> > > 4) Keep TOP queries with aggressive physical reads
>> > > ....
>> > >
>> > >
>> > > Such statistics could be available least at JMX and SQL interfaces.
>> > >
>> > > Let's discuss. In case it will be interested for you I can dig deeper
>> > into
>> > > the area and prepare IEP based on our discussion.
>> > >
>> > >
>> > > Igniters, what do you think?
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Живи с улыбкой! :D
>> > >
>> >
>>
>

-- 
Живи с улыбкой! :D

Re: Page IO statistics for Ignite

Posted by Alex Plehanov <pl...@gmail.com>.

Hi,

I've made some investigation a couple of months ago about a statistics
collected by some RDBMS vendors (Oracle, Postgres, MySQL). These databases
collect detailed IO statistics in dimensions such as queries, database
objects (tables and indexes), files, sessions, users, event types etc.

Some views where you can get IO statistics:
Oracle: v$filestat, v$segment_statistics, v$sqlarea, v$sysstat, v$sesstat
Postgres: pg_stat_database, pg_statio_all_tables, pg_statio_all_indexes,
pg_statio_all_sequences
MySQL: table_io_waits_summary_by_table,
table_io_waits_summary_by_index_usage, io_global_by_file_by_bytes,
io_global_by_wait_by_bytes, file_summary_by_event_name,
file_summary_by_instance, host_summary_by_file_io, user_summary_by_file_io,
metrics, etc.

I think we can start by collecting statistics per FilePageStore (updating
counters on read(...) and write(...) methods). Each FilePageStore is
bounded to cache and partition or cache index, so we can easily aggregate
values and get IO statistics per cache/index/node.

вт, 25 сент. 2018 г. в 10:57, Vladimir Ozerov <vo...@gridgain.com>:

> Hi Yuriy,
>
> I think this is great idea. But we need to collect more details on how and
> what to collect. I think one of the most interesting parts for us would be
> index and data page usages, split by different "dimensions":
> 1) Global node statistics
> 2) Per-cache statistics
> 3) Per-index statistics
>
> We can start with a short summary of what is collected by other database
> vendors.
>
> Vladimir.
>
>
> On Sat, Sep 22, 2018 at 1:07 AM Denis Magda <dm...@apache.org> wrote:
>
> > Hello Yuri,
> >
> > I might give useful feedback if see how the metrics will look like from
> the
> > API standpoint. If it's not difficult please create a draft.
> >
> > AS for the interface, in addition to JMX and SQL we need to ensure Visor
> > CMD and Web Console gets updated. *Alex K.*, please join the thread and
> > share your requirements.
> >
> > --
> > Denis
> >
> > On Fri, Sep 21, 2018 at 8:16 AM Юрий <ju...@gmail.com>
> wrote:
> >
> > > Hi Igniters,
> > >
> > > I started IGNITE-8580
> > > <https://issues.apache.org/jira/browse/IGNITE-8580> ticket
> > > related to print page read/write metrics and did some investigation
> what
> > > other databases provide for the similar purposes.
> > >
> > > Based on the investigation I want to propose my raw vision of how to
> > IGNITE
> > > can be more transparent from performance perspective.
> > >
> > > Need to collect statistics for logical (from memory) and physical (from
> > > storage) page reads/writes. All these metrics should be separated by
> next
> > > dimensions:
> > > 1) index/cache
> > > 2) query level
> > > 3) node/cluster
> > > ...
> > >
> > > Seems the statistics should be limited by time.
> > >
> > > If we will have such statistics we could realize such things as:
> > > 1) Get IO statistics per SQL query, global or/and splitted by
> > > indexes/caches.
> > > 2) Have ability to understand why performance goes down in case it
> > related
> > > to IO. For example on concrete node or cache.
> > > 3) Evaluate effectiveness of use indexes. Find unused indexes.
> > > 4) Keep TOP queries with aggressive physical reads
> > > ....
> > >
> > >
> > > Such statistics could be available least at JMX and SQL interfaces.
> > >
> > > Let's discuss. In case it will be interested for you I can dig deeper
> > into
> > > the area and prepare IEP based on our discussion.
> > >
> > >
> > > Igniters, what do you think?
> > >
> > >
> > >
> > >
> > > --
> > > Живи с улыбкой! :D
> > >
> >
>

Re: Page IO statistics for Ignite

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Hi Yuriy,

I think this is great idea. But we need to collect more details on how and
what to collect. I think one of the most interesting parts for us would be
index and data page usages, split by different "dimensions":
1) Global node statistics
2) Per-cache statistics
3) Per-index statistics

We can start with a short summary of what is collected by other database
vendors.

Vladimir.


On Sat, Sep 22, 2018 at 1:07 AM Denis Magda <dm...@apache.org> wrote:

> Hello Yuri,
>
> I might give useful feedback if see how the metrics will look like from the
> API standpoint. If it's not difficult please create a draft.
>
> AS for the interface, in addition to JMX and SQL we need to ensure Visor
> CMD and Web Console gets updated. *Alex K.*, please join the thread and
> share your requirements.
>
> --
> Denis
>
> On Fri, Sep 21, 2018 at 8:16 AM Юрий <ju...@gmail.com> wrote:
>
> > Hi Igniters,
> >
> > I started IGNITE-8580
> > <https://issues.apache.org/jira/browse/IGNITE-8580> ticket
> > related to print page read/write metrics and did some investigation what
> > other databases provide for the similar purposes.
> >
> > Based on the investigation I want to propose my raw vision of how to
> IGNITE
> > can be more transparent from performance perspective.
> >
> > Need to collect statistics for logical (from memory) and physical (from
> > storage) page reads/writes. All these metrics should be separated by next
> > dimensions:
> > 1) index/cache
> > 2) query level
> > 3) node/cluster
> > ...
> >
> > Seems the statistics should be limited by time.
> >
> > If we will have such statistics we could realize such things as:
> > 1) Get IO statistics per SQL query, global or/and splitted by
> > indexes/caches.
> > 2) Have ability to understand why performance goes down in case it
> related
> > to IO. For example on concrete node or cache.
> > 3) Evaluate effectiveness of use indexes. Find unused indexes.
> > 4) Keep TOP queries with aggressive physical reads
> > ....
> >
> >
> > Such statistics could be available least at JMX and SQL interfaces.
> >
> > Let's discuss. In case it will be interested for you I can dig deeper
> into
> > the area and prepare IEP based on our discussion.
> >
> >
> > Igniters, what do you think?
> >
> >
> >
> >
> > --
> > Живи с улыбкой! :D
> >
>

Re: Page IO statistics for Ignite

Posted by Denis Magda <dm...@apache.org>.

Hello Yuri,

I might give useful feedback if see how the metrics will look like from the
API standpoint. If it's not difficult please create a draft.

AS for the interface, in addition to JMX and SQL we need to ensure Visor
CMD and Web Console gets updated. *Alex K.*, please join the thread and
share your requirements.

--
Denis

On Fri, Sep 21, 2018 at 8:16 AM Юрий <ju...@gmail.com> wrote:

> Hi Igniters,
>
> I started IGNITE-8580
> <https://issues.apache.org/jira/browse/IGNITE-8580> ticket
> related to print page read/write metrics and did some investigation what
> other databases provide for the similar purposes.
>
> Based on the investigation I want to propose my raw vision of how to IGNITE
> can be more transparent from performance perspective.
>
> Need to collect statistics for logical (from memory) and physical (from
> storage) page reads/writes. All these metrics should be separated by next
> dimensions:
> 1) index/cache
> 2) query level
> 3) node/cluster
> ...
>
> Seems the statistics should be limited by time.
>
> If we will have such statistics we could realize such things as:
> 1) Get IO statistics per SQL query, global or/and splitted by
> indexes/caches.
> 2) Have ability to understand why performance goes down in case it related
> to IO. For example on concrete node or cache.
> 3) Evaluate effectiveness of use indexes. Find unused indexes.
> 4) Keep TOP queries with aggressive physical reads
> ....
>
>
> Such statistics could be available least at JMX and SQL interfaces.
>
> Let's discuss. In case it will be interested for you I can dig deeper into
> the area and prepare IEP based on our discussion.
>
>
> Igniters, what do you think?
>
>
>
>
> --
> Живи с улыбкой! :D
>