You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Sasha Belyak <rt...@gmail.com> on 2020/09/24 11:30:20 UTC

IEP-58: Statistics

Igniters,
I'e prepared an IEP [1], please review and let me know what you think.

In particular, I'd like to discuss the new subsystem to collect statistics
to optimize sql queries execution.
[1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-58+Statistics

Re: Re[2]: IEP-58: Statistics

Posted by Sasha Belyak <rt...@gmail.com>.

Hello Zhenya,
First of all - local statistics are not useless cause we can use it in
local H2 query planning phase, at least for now.
"Client ..." - any node, that can build query execution plans. I believe
that lately we can do query execution planning on client nodes. But it's a
good question and I rename "client node" to "query planning node".
" If there are no statistics in all of them - client will choose random   "
- I suppose we can choose any server node that can hold data, covered by
requested statistics. But if query planning node needs statistics by two or
more tables - they can be located in a separate groups of server nodes so
such queries should be send separately. So the answer is yes (we should
send statistics request and collection request only to nodes, storing the
table) and no (collection request can be send to any of that node)
"After getting statistics client will cache it and server node it to renew
statistics from same node" - I mean that after getting collected statistics
client node can cache server node which has sent statistics to get future
updates. Client will renew its cache with TTL approach while server can
decide when statistics should be collected again by, for example, counting
the number of changed rows in underlying tables.
"Whats the storage mechanism for client node statistics?" - no storage,
even server node won't store global statistics persistently, but they will
store local partition level statistics to speedup collection (do
aggregation instead of collection) after restart.
"Can we use thin client without discs in such cases?" - certainly, no
persistent store needed on any client nodes.
I made minor changes in IEP according to your notices. Follow-up questions
are welcome.

пт, 16 окт. 2020 г. в 14:31, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
> Andrey, thanks for firing this !
> Sasha it`s unclear for me « These part consists of two processes:
> statistics collection process itself and acquiring statistics by the
> client. »:
> *  I agree that in both cases local statistics are useless.
> May be we need more informative use cases for such statistics usage ? Can
> someone append additional columns (possible not presented in index)
> statistics?
> *  Client — can you unfold this term ?  If this means — ignite client node
> ? Does sql best plan is chosen in request starter node ? If so — what about
> this client with limited cpu here?
> *  « If there are no statistics in all of them - client will choose random
>   » — not random but affinity concerted isn`t it ?
> *  « After getting statistics client will cache it and server node it to
> renew statistics from same node. »  I don`t understand this
> approach, can you clarify it plz ?
> *  Whats the storage mechanism for client node statistics?
> *  Can we use thin client without discs in such cases?
> thanks !
>
> >:
> >
> >Follow up
> >
> >Igniters,
> >
> >is there any comment to this IEP?
> >
> >JFYI, IEP is renamed and placed here [1]
> >
> >[1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-58%3A+Statistics+for+SQL+query+optimization
> >
> >On Thu, Sep 24, 2020 at 2:30 PM Sasha Belyak < rtsforex@gmail.com >
> wrote:
> >>
> >> Igniters,
> >> I'e prepared an IEP [1], please review and let me know what you think.
> >>
> >> In particular, I'd like to discuss the new subsystem to collect
> statistics
> >> to optimize sql queries execution.
> >> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-58+Statistics
>
>
>

Re[2]: IEP-58: Statistics

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.

Andrey, thanks for firing this ! 
Sasha it`s unclear for me « These part consists of two processes: statistics collection process itself and acquiring statistics by the client. »:
*  I agree that in both cases local statistics are useless.
May be we need more informative use cases for such statistics usage ? Can someone append additional columns (possible not presented in index) statistics? 
*  Client — can you unfold this term ?  If this means — ignite client node ? Does sql best plan is chosen in request starter node ? If so — what about this client with limited cpu here? 
*  « If there are no statistics in all of them - client will choose random   » — not random but affinity concerted isn`t it ? 
*  « After getting statistics client will cache it and server node it to renew statistics from same node. »  I don`t understand this approach, can you clarify it plz ?
*  Whats the storage mechanism for client node statistics?
*  Can we use thin client without discs in such cases?
thanks !

>:
> 
>Follow up
>
>Igniters,
>
>is there any comment to this IEP?
>
>JFYI, IEP is renamed and placed here [1]
>
>[1]  https://cwiki.apache.org/confluence/display/IGNITE/IEP-58%3A+Statistics+for+SQL+query+optimization
>
>On Thu, Sep 24, 2020 at 2:30 PM Sasha Belyak < rtsforex@gmail.com > wrote:
>>
>> Igniters,
>> I'e prepared an IEP [1], please review and let me know what you think.
>>
>> In particular, I'd like to discuss the new subsystem to collect statistics
>> to optimize sql queries execution.
>> [1]  https://cwiki.apache.org/confluence/display/IGNITE/IEP-58+Statistics

Re: IEP-58: Statistics

Posted by Andrey Gura <ag...@apache.org>.

Follow up

Igniters,

is there any comment to this IEP?

JFYI, IEP is renamed and placed here [1]

[1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-58%3A+Statistics+for+SQL+query+optimization

On Thu, Sep 24, 2020 at 2:30 PM Sasha Belyak <rt...@gmail.com> wrote:
>
> Igniters,
> I'e prepared an IEP [1], please review and let me know what you think.
>
> In particular, I'd like to discuss the new subsystem to collect statistics
> to optimize sql queries execution.
> [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-58+Statistics