You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2019/11/25 19:30:00 UTC

[jira] [Created] (ARROW-7254) BaseVariableWidthVector#setSafe appears to make value offsets inconsistent

David Li created ARROW-7254:
-------------------------------

             Summary: BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
                 Key: ARROW-7254
                 URL: https://issues.apache.org/jira/browse/ARROW-7254
             Project: Apache Arrow
          Issue Type: Bug
          Components: Java
    Affects Versions: 0.15.1
            Reporter: David Li


The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} on reading.

Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index fixes it. While it seems from the new documentation that we should (must?) call {{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to get an invalid file by calling using {{setSafe}}, either. 

Full traceback:
{noformat}
> python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas
    table = self.read_all()
  File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
  File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
  File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4
{noformat}
 
Full program:
{code:java}
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Collections;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowStreamWriter;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;

public class AsdfTest {

  public static void main(String[] args) throws Exception {
    Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8())));

    try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
        VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
      root.setRowCount(2);
      VarCharVector v = (VarCharVector) root.getVector("a");
      v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
      try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) {
        ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
        writer.writeBatch();
        writer.close();
      }
    }
  }
}
{code}

{{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using {{set}} instead of {{setSafe}} will fail in Java.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)