You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Alex Rodoni (JIRA)" <ji...@apache.org> on 2018/06/08 00:52:00 UTC

[jira] [Closed] (IMPALA-5740) Clarify STRING size limit

     [ https://issues.apache.org/jira/browse/IMPALA-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Rodoni closed IMPALA-5740.
-------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> Clarify STRING size limit
> -------------------------
>
>                 Key: IMPALA-5740
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5740
>             Project: IMPALA
>          Issue Type: Documentation
>          Components: Docs
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Alex Rodoni
>            Priority: Critical
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> The Impala docs currently state that strings have a maximum size of 32kb. http://impala.apache.org/docs/build/html/topics/impala_string.html
> This is misleading and causes confusion. We should clarify this in a way that makes it clear that 32kb is not the actual limit, but also makes it clear that performance and memory consumption will degrade with larger strings.
> Here's what the behaviour is. Unsure the best way to succinctly characterise it.
> * We expect that queries operating on 32KB strings will work reliably and not hit significant performance or memory problems (unless you have very complex queries, very many columns, etc).
> * There is an absolute hard limit of ~ 2GB on strings.
> * Memory consumption of queries will grow as string sizes increase and the probability of hitting "memory limit exceeded" will increase.
> * Performance of queries will decrease as strings get larger.
> * The row size (i.e. total size of all string and other columns) is limited in various places by the implementation of various operators.
> ** Rows coming from the right side of any hash join
> ** Rows coming from either side of a spilling hash join
> ** Rows being sorted
> ** Rows in a grouping aggregation.
> In Impala <= 2.9 the default limit in those places is 8MB.
> With the IMPALA-3200 changes that default number may decrease significantly, to 2MB or less, but there will be a new query option, something like MAX_ROW_SIZE, that will make this row size limit configurable on a per-query basis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org