You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2019/11/25 19:30:00 UTC
[jira] [Created] (ARROW-7254) BaseVariableWidthVector#setSafe
appears to make value offsets inconsistent
David Li created ARROW-7254:
-------------------------------
Summary: BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
Key: ARROW-7254
URL: https://issues.apache.org/jira/browse/ARROW-7254
Project: Apache Arrow
Issue Type: Bug
Components: Java
Affects Versions: 0.15.1
Reporter: David Li
The following program writes a file which PyArrow either segfaults (0.14.1) or rejects with an error (0.15.1) {{pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4}} on reading.
Calling {{setRowCount}} again, or calling {{setSafe}} with a higher index fixes it. While it seems from the new documentation that we should (must?) call {{VectorSchemaRoot#setRowCount}} at the end, I wouldn't have expected to get an invalid file by calling using {{setSafe}}, either.
Full traceback:
{noformat}
> python3 -c 'import pyarrow as pa; print(pa.ipc.open_stream(open("./test.bin", "rb")).read_pandas())'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/lidavidm/Flight/arrow-5137-auth/java/venv/lib/python3.7/site-packages/pyarrow/ipc.py", line 46, in read_pandas
table = self.read_all()
File "pyarrow/ipc.pxi", line 330, in pyarrow.lib._CRecordBatchReader.read_all
File "pyarrow/public-api.pxi", line 321, in pyarrow.lib.pyarrow_wrap_table
File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 0: Offset invariant failure at: 2 inconsistent value_offsets for null slot0!=4
{noformat}
Full program:
{code:java}
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Collections;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowStreamWriter;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.Schema;
public class AsdfTest {
public static void main(String[] args) throws Exception {
Schema schema = new Schema(Collections.singletonList(Field.nullable("a", new ArrowType.Utf8())));
try (BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
root.setRowCount(2);
VarCharVector v = (VarCharVector) root.getVector("a");
v.setSafe(0, "asdf".getBytes(StandardCharsets.UTF_8));
try (OutputStream output = Files.newOutputStream(Paths.get("./test.bin"))) {
ArrowStreamWriter writer = new ArrowStreamWriter(root, null, output);
writer.writeBatch();
writer.close();
}
}
}
}
{code}
{{v.setNull(1)}} after {{v.setSafe(0, "asdf")}} does not fix it. Using {{set}} instead of {{setSafe}} will fail in Java.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)