You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2020/06/14 23:37:00 UTC

[jira] [Updated] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

     [ https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar updated IMPALA-8925:
---------------------------------
    Priority: Minor  (was: Major)

> Consider replacing ClientRequestState ResultCache with result spooling
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-8925
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8925
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Clients
>            Reporter: Sahil Takiar
>            Priority: Minor
>
> The {{ClientRequestState}} maintains an internal results cache (which is really just a {{QueryResultSet}}) in order to provide support for the {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide a few benefits; it should be able to fit more rows since it can spill to disk. The memory is better tracked as well since it integrates with both admitted and reserved memory. Hue currently sets the max result set fetch size to [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61], would be good to check how well that value works for Hue users so we can decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it discards rows whenever it reads them from the underlying {{BufferedTupleStream}}. It would need the ability to reset the read cursor, which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org