You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2019/11/28 02:27:00 UTC

[jira] [Resolved] (ARROW-7254) BaseVariableWidthVector#setSafe appears to make value offsets inconsistent

     [ https://issues.apache.org/jira/browse/ARROW-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Li resolved ARROW-7254.
-----------------------------
    Fix Version/s: 1.0.0
       Resolution: Fixed

Issue resolved by pull request 5898
[https://github.com/apache/arrow/pull/5898]

> BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
> --------------------------------------------------------------------------
>
>                 Key: ARROW-7254
>                 URL: https://issues.apache.org/jira/browse/ARROW-7254
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.15.1
>            Reporter: David Li
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} on reading.
> Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index fixes it. While it seems from the new documentation that we should (must?) call {{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to get an invalid file by calling using {{setSafe}}, either. 
> Full traceback:
> {noformat}
> > python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas
>     table = self.read_all()
>   File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
>   File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
>   File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4
> {noformat}
>  
> Full program:
> {code:java}
> import java.io.OutputStream;
> import java.nio.charset.StandardCharsets;
> import java.nio.file.Files;
> import java.nio.file.Paths;
> import java.util.Collections;
> import org.apache.arrow.memory.BufferAllocator;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.ipc.ArrowStreamWriter;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> public class AsdfTest {
>   public static void main(String[] args) throws Exception {
>     Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8())));
>     try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
>         VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
>       root.setRowCount(2);
>       VarCharVector v = (VarCharVector) root.getVector("a");
>       v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
>       try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) {
>         ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
>         writer.writeBatch();
>         writer.close();
>       }
>     }
>   }
> }
> {code}
> {{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using {{set}} instead of {{setSafe}} will fail in Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)