You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/04/17 19:11:00 UTC

[jira] [Comment Edited] (ARROW-7779) [Format] Enable integration tests for dictionaries-within-dictionaries

    [ https://issues.apache.org/jira/browse/ARROW-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085995#comment-17085995 ] 

Wes McKinney edited comment on ARROW-7779 at 4/17/20, 7:11 PM:
---------------------------------------------------------------

Sorry, it's not a type. Here would be the representation (simplified) in Flatbuffers

{code}
Field<
  name: "movie_genres",
  type: Array
  dictionary: id=1
  child[0]: Field<
      name: "item",
      type: String
      dictionary: id=0
  >
>
{code}

From an algebraic standpoint, the way that dictionary encoding is implemented is as:

{code}
Array <
  indices: Array,
  dictionary: Array
>
{code}

So if we disallow "dictionary" from itself containing dictionary-encoded data (which is not difficult to construct in-memory), then we have to do data sanitization either at time of array construction or upon writing to IPC. Both of those options are icky but the percentage of users that will be harmed by them is small


was (Author: wesmckinn):
Sorry, it's not a type. Here would be the representation (simplified) in Flatbuffers

{code}
Field<
  name: "movie_genres",
  type: Array<
    child[0]: Field<
      name: "item",
      type: String
      dictionary: id=0
    >
  >
  dictionary: id=1
>
{code}

From an algebraic standpoint, the way that dictionary encoding is implemented is as:

{code}
Array <
  indices: Array,
  dictionary: Array
>
{code}

So if we disallow "dictionary" from itself containing dictionary-encoded data (which is not difficult to construct in-memory), then we have to do data sanitization either at time of array construction or upon writing to IPC. Both of those options are icky but the percentage of users that will be harmed by them is small

> [Format] Enable integration tests for dictionaries-within-dictionaries
> ----------------------------------------------------------------------
>
>                 Key: ARROW-7779
>                 URL: https://issues.apache.org/jira/browse/ARROW-7779
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format, Integration
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 1.0.0
>
>
> The integration test is implemented but currently disabled for all implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)