You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Vladimir Ozerov (JIRA)" <ji...@apache.org> on 2017/09/28 06:59:01 UTC

[jira] [Updated] (IGNITE-6019) SQL: client node should not hold the whole data set in-memory when possible

     [ https://issues.apache.org/jira/browse/IGNITE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Ozerov updated IGNITE-6019:
------------------------------------
    Labels: important performance  (was: performance)

> SQL: client node should not hold the whole data set in-memory when possible
> ---------------------------------------------------------------------------
>
>                 Key: IGNITE-6019
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6019
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>    Affects Versions: 2.1
>            Reporter: Vladimir Ozerov
>            Assignee: Alexander Paschenko
>            Priority: Critical
>              Labels: important, performance
>             Fix For: 2.3
>
>
> Our SQL engine requests request data from server nodes in pieces called "page". This allows us to control memory consumption on client side. However, currently our client code is designed in a way that all pages are requested from all servers before a single cursor row is returned to the user. It defeats the whole idea of "cursor" and "page", and could easily crash client node with OOME. 
> We need to fix that and request further pages in a kind of sliding window, keeping no more than "N" pages in memory simultaneously. Note that sometimes it is not possible, e.g. in case of {{DISTINCT}} or non-collocated {{GROUP BY}}. In this case we would have to build the whole result set first anyway. So let's focus on a scenario when the whole result set is not needed.
> As currently everything is requested synchronously page-by-page, in the first version it would be enough to distribute synchronous page requests between cursor reads, without any prefetch. 
> Implementation details:
> 1) Optimization should be applied only to {{skipMergeTbl=true}} cases, when complete result set of map queries is not needed.
> 2) Starting point is {{GridReduceQueryExecutor#query}}, see {{skipMergeTbl=true}} branch - this is where we get all pages eagerly.
> 3) Get no more than one page from the server at a time. We request the page, iterate over it, then request another page.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)