You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/08/01 22:34:00 UTC
[jira] [Updated] (ARROW-5887) [C#] ArrowStreamWriter writes FieldNodes in wrong order

     [ https://issues.apache.org/jira/browse/ARROW-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-5887:
--------------------------------
    Fix Version/s:     (was: 1.0.0)
                   0.15.0

> [C#] ArrowStreamWriter writes FieldNodes in wrong order
> -------------------------------------------------------
>
>                 Key: ARROW-5887
>                 URL: https://issues.apache.org/jira/browse/ARROW-5887
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C#
>            Reporter: Eric Erhardt
>            Assignee: Eric Erhardt
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.14.1, 0.15.0
>
>   Original Estimate: 4h
>          Time Spent: 0.5h
>  Remaining Estimate: 3.5h
>
> When ArrowStreamWriter is writing a {{RecordBatch}} with {{null}}s in it, it is mixing up the column's {{NullCount}}.
> You can see here:
> [https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L195-L200]
> It is writing the fields from {{0}} -> {{fieldCount}} order. But then [lower|https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L216-L220], it is writing the fields from {{fieldCount}} -> {{0}}.
> Looking at the [Java implementation|https://github.com/apache/arrow/blob/7b2d68570b4336308c52081a0349675e488caf11/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/FBSerializables.java#L36-L44] it says
> {quote}// struct vectors have to be created in reverse order
> {quote}
>  
> A simple test of roundtripping the following RecordBatch shows the issue:
>  
> {code:java}
> var result = new RecordBatch(
> new Schema.Builder()
> .Field(f => f.Name("age").DataType(Int32Type.Default))
> .Field(f => f.Name("CharCount").DataType(Int32Type.Default))
> .Build(),
> new IArrowArray[]
> {
> new Int32Array(
> new ArrowBuffer.Builder<int>().Append(0).Build(),
> new ArrowBuffer.Builder<byte>().Append(0).Build(),
> length: 1,
> nullCount: 1,
> offset: 0),
> new Int32Array(
> new ArrowBuffer.Builder<int>().Append(7).Build(),
> ArrowBuffer.Empty,
> length: 1,
> nullCount: 0,
> offset: 0)
> },
> length: 1);
> {code}
> Here, the "age" column should have a `null` in it. However, when you write and read this RecordBatch back, you see that the "CharCount" column has `NullCount` == 1 and "age" column has `NullCount` == 0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)