You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/07/28 16:13:00 UTC

[jira] [Created] (IMPALA-5740) Clarify STRING size limit

Tim Armstrong created IMPALA-5740:
-------------------------------------

             Summary: Clarify STRING size limit
                 Key: IMPALA-5740
                 URL: https://issues.apache.org/jira/browse/IMPALA-5740
             Project: IMPALA
          Issue Type: Documentation
          Components: Docs
    Affects Versions: Impala 2.10.0
            Reporter: Tim Armstrong
            Assignee: John Russell


The Impala docs currently state that strings have a maximum size of 32kb. http://impala.apache.org/docs/build/html/topics/impala_string.html

This is misleading and causes confusion. We should clarify this in a way that makes it clear that 32kb is not the actual limit, but also makes it clear that performance and memory consumption will degrade with larger strings.

Here's what the behaviour is. Unsure the best way to succinctly characterise it.
* We expect that queries operating on 32KB strings will work reliably and not hit significant performance or memory problems (unless you have very complex queries, very many columns, etc).
* There is an absolute hard limit of ~ 2GB on strings.
* Memory consumption of queries will grow as string sizes increase and the probability of hitting "memory limit exceeded" will increase.
* Performance of queries will decrease as strings get larger.
* The row size (i.e. total size of all string and other columns) is limited in various places by the implementation of various operators.
** Rows coming from the right side of any hash join
** Rows coming from either side of a spilling hash join
** Rows being sorted
** Rows in a grouping aggregation.

In Impala <= 2.9 the default limit in those places is 8MB.

With the IMPALA-3200 changes that default number may decrease significantly, to 2MB or less, but there will be a new query option, something like MAX_ROW_SIZE, that will make this row size limit configurable on a per-query basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)