You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Liya Fan (Jira)" <ji...@apache.org> on 2019/11/26 01:36:00 UTC

[jira] [Comment Edited] (ARROW-7254) BaseVariableWidthVector#setSafe appears to make value offsets inconsistent

    [ https://issues.apache.org/jira/browse/ARROW-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982040#comment-16982040 ] 

Liya Fan edited comment on ARROW-7254 at 11/26/19 1:35 AM:
-----------------------------------------------------------

Hi [~lidavidm] IMO, the behavior for Java API is as expected: VectorSchemaRoot#setRowCount calls the setValueCount for each underlying vector. However, for BaseVariableWidthVector, the setVectorCount only makes sure there is enough capacity for validity buffer and offset buffer, so there is the possibility that the data buffer does not have enough capacity. That is why we must call setSafe instead of set.

For BaseFixedWidthVector, there is no such problem. 

Maybe I don't fully understand your point?


was (Author: fan_li_ya):
Hi [~lidavidm] IMO, the behavior for Java API is as expected: VectorSchemaRoot#setRowCount calls the setValueCount for each underlying vector. However, for BaseVariableWidthVector, the setVectorCount only makes sure there is enough capacity for validity buffer and offset buffer, so there is the possibility that the data buffer does not have enough capacity. That is why we must call setSafe instead of set.

For BaseFixedWidthVector, there is no such problem. 

> BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
> --------------------------------------------------------------------------
>
>                 Key: ARROW-7254
>                 URL: https://issues.apache.org/jira/browse/ARROW-7254
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.15.1
>            Reporter: David Li
>            Priority: Minor
>
> The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} on reading.
> Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index fixes it. While it seems from the new documentation that we should (must?) call {{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to get an invalid file by calling using {{setSafe}}, either. 
> Full traceback:
> {noformat}
> > python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas
>     table = self.read_all()
>   File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
>   File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
>   File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4
> {noformat}
>  
> Full program:
> {code:java}
> import java.io.OutputStream;
> import java.nio.charset.StandardCharsets;
> import java.nio.file.Files;
> import java.nio.file.Paths;
> import java.util.Collections;
> import org.apache.arrow.memory.BufferAllocator;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.ipc.ArrowStreamWriter;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> public class AsdfTest {
>   public static void main(String[] args) throws Exception {
>     Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8())));
>     try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
>         VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
>       root.setRowCount(2);
>       VarCharVector v = (VarCharVector) root.getVector("a");
>       v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
>       try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) {
>         ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
>         writer.writeBatch();
>         writer.close();
>       }
>     }
>   }
> }
> {code}
> {{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using {{set}} instead of {{setSafe}} will fail in Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)