You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/09/13 10:15:00 UTC

[jira] [Updated] (ARROW-9969) [C++] RecordBatchBuilder yields invalid result with dictionary fields

     [ https://issues.apache.org/jira/browse/ARROW-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-9969:
----------------------------------
    Labels: pull-request-available  (was: )

> [C++] RecordBatchBuilder yields invalid result with dictionary fields
> ---------------------------------------------------------------------
>
>                 Key: ARROW-9969
>                 URL: https://issues.apache.org/jira/browse/ARROW-9969
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 1.0.1
>            Reporter: Pierre Belzile
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The record batch builder takes a schema as input and uses that schema when creating the record batch.
> However when one or more fields are dictionaries, the data type is unknown until the dictionary builder flushes and the initial schema often does not match. The builder needs to modify the schema for the actual data type generated.
> This problem is easily reproduced by providing a schema with a field dictionary(int16(), utf8()) and adding a single row. This yields a data type of dictionary(int8(),utf8()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)