You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/09/01 20:51:02 UTC

[jira] [Resolved] (ARROW-1407) Dictionaries can only hold a maximum of 4096 indices

     [ https://issues.apache.org/jira/browse/ARROW-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved ARROW-1407.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 1024
[https://github.com/apache/arrow/pull/1024]

> Dictionaries can only hold a maximum of 4096 indices
> ----------------------------------------------------
>
>                 Key: ARROW-1407
>                 URL: https://issues.apache.org/jira/browse/ARROW-1407
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java - Vectors
>    Affects Versions: 0.6.0
>            Reporter: Shayan Monshizadeh
>            Assignee: Li Jin
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: Screen Shot 2017-08-22 at 7.14.07 PM.png
>
>
> Dictionaries seem to only be able to hold 4096 indices, meaning only vectors with 4096 values or less can be turned into dictionaries. The image attached is a stack trace of what happens when try to encode a dictionary with a vector containing 4097 strings, and a dictionary containing two distinct values. 
> Basically the error can be traced to line 95 of DictionaryEncoder.java (`setter.invoke(mutator, i, encoded);`). It seems that the indices array which hold the encoded values is allocated on line 84 as `indices.allocateNew()` and it seems that `allocateNew()` only allocates 4096 bytes of data initially. The code runs if there are 4096 rows of data or less. Anymore and the same error is given.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)