You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Sebastien Binet <bi...@cern.ch> on 2019/04/03 17:21:45 UTC

round-trip tests for Arrow files

hi there,

I am working on the deserialization support for the Go backend.
at this point, I have (I think) primitive and binary/string arrays working
with a simple Arrow file I created like so:

import pyarrow as pa
data = [
pa.array([1, 2, 3, None, 5], type="i4"),
        pa.array(['foo', 'bar', 'baz', None, "quux"]),
        pa.array([1, 2, None, 4, 5], type="f4"),
        pa.array([True, None, False, True, False])
]

batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2', "f3"])
sink = pa.BufferOutputStream()
writer = pa.RecordBatchFileWriter(sink, batch.schema)

for i in range(5):
    writer.write_batch(batch)
writer.close()

buf = sink.getvalue()
f = open("out.dat", "wb")
f.write(buf.to_pybytes())
f.close()

and, as I said, I can now successfully read that back from Go.
but I was wondering what's the recommanded way to test for this kind of
round-trip/cross-language thing.

I tried to play a bit with "integration/integration_test.py" but it fails
with:

##########################################################
C++ producing, C++ consuming
##########################################################
==========================================================
Testing file /home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/struct_example.json
==========================================================
-- Creating binary inputs
/home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/75bc5ca6_struct_example.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/struct_example.json
--mode=JSON_TO_ARROW
Command failed: /home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/75bc5ca6_struct_example.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/struct_example.json
--mode=JSON_TO_ARROW
With output:
--------------
Found schema: struct_nullable: struct<f1: int32, f2: string>

--------------
==========================================================
Testing file /home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/simple.json
==========================================================
-- Creating binary inputs
/home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/b0d388ed_simple.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/simple.json --mode=JSON_TO_ARROW
Command failed: /home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/b0d388ed_simple.json_as_file
--json=/home/binet/work/gonum/src/
github.com/apache/arrow/integration/data/simple.json --mode=JSON_TO_ARROW
With output:
--------------
Found schema: foo: int32
bar: double
baz: string

--------------
==========================================================
Testing file testdata/generated_primitive.json
==========================================================
-- Creating binary inputs
/home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/8eacb124_generated_primitive.json_as_file
--json=testdata/generated_primitive.json --mode=JSON_TO_ARROW
Command failed: /home/binet/work/gonum/src/
github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
--integration --arrow=testdata/8eacb124_generated_primitive.json_as_file
--json=testdata/generated_primitive.json --mode=JSON_TO_ARROW
With output:
--------------
Found schema: bool_nullable: bool
bool_nonnullable: bool not null
int8_nullable: int8
int8_nonnullable: int8 not null
int16_nullable: int16
int16_nonnullable: int16 not null
int32_nullable: int32
int32_nonnullable: int32 not null
int64_nullable: int64
int64_nonnullable: int64 not null
uint8_nullable: uint8
uint8_nonnullable: uint8 not null
uint16_nullable: uint16
uint16_nonnullable: uint16 not null
uint32_nullable: uint32
uint32_nonnullable: uint32 not null
uint64_nullable: uint64
uint64_nonnullable: uint64 not null
float32_nullable: float
float32_nonnullable: float not null
float64_nullable: double
float64_nonnullable: double not null
binary_nullable: binary
binary_nonnullable: binary not null
utf8_nullable: string
utf8_nonnullable: string not null
fixedsizebinary_19_nullable: fixed_size_binary[19]
fixedsizebinary_19_nonnullable: fixed_size_binary[19] not null
fixedsizebinary_120_nullable: fixed_size_binary[120]
fixedsizebinary_120_nonnullable: fixed_size_binary[120] not null

--------------

is this supposed to work?
are there reference files already available somewhere?

cheers,
-s

Re: round-trip tests for Arrow files

Posted by Wes McKinney <we...@gmail.com>.
hi Sebastien,

The integration tests indeed should work (they are run in the for
release verification script [1]), so something is either wrong with
your C++ build or your environment if integration_test.py fails. It
would be great to get Go into the integration tests to have proof that
the implementation is compatible with the others (C++, Java, JS)

- Wes

[1]: https://github.com/apache/arrow/blob/master/dev/release/verify-release-candidate.sh#L386

On Wed, Apr 3, 2019 at 12:22 PM Sebastien Binet <bi...@cern.ch> wrote:
>
> hi there,
>
> I am working on the deserialization support for the Go backend.
> at this point, I have (I think) primitive and binary/string arrays working
> with a simple Arrow file I created like so:
>
> import pyarrow as pa
> data = [
> pa.array([1, 2, 3, None, 5], type="i4"),
>         pa.array(['foo', 'bar', 'baz', None, "quux"]),
>         pa.array([1, 2, None, 4, 5], type="f4"),
>         pa.array([True, None, False, True, False])
> ]
>
> batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2', "f3"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchFileWriter(sink, batch.schema)
>
> for i in range(5):
>     writer.write_batch(batch)
> writer.close()
>
> buf = sink.getvalue()
> f = open("out.dat", "wb")
> f.write(buf.to_pybytes())
> f.close()
>
> and, as I said, I can now successfully read that back from Go.
> but I was wondering what's the recommanded way to test for this kind of
> round-trip/cross-language thing.
>
> I tried to play a bit with "integration/integration_test.py" but it fails
> with:
>
> ##########################################################
> C++ producing, C++ consuming
> ##########################################################
> ==========================================================
> Testing file /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/data/struct_example.json
> ==========================================================
> -- Creating binary inputs
> /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
> --integration --arrow=testdata/75bc5ca6_struct_example.json_as_file
> --json=/home/binet/work/gonum/src/
> github.com/apache/arrow/integration/data/struct_example.json
> --mode=JSON_TO_ARROW
> Command failed: /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
> --integration --arrow=testdata/75bc5ca6_struct_example.json_as_file
> --json=/home/binet/work/gonum/src/
> github.com/apache/arrow/integration/data/struct_example.json
> --mode=JSON_TO_ARROW
> With output:
> --------------
> Found schema: struct_nullable: struct<f1: int32, f2: string>
>
> --------------
> ==========================================================
> Testing file /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/data/simple.json
> ==========================================================
> -- Creating binary inputs
> /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
> --integration --arrow=testdata/b0d388ed_simple.json_as_file
> --json=/home/binet/work/gonum/src/
> github.com/apache/arrow/integration/data/simple.json --mode=JSON_TO_ARROW
> Command failed: /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
> --integration --arrow=testdata/b0d388ed_simple.json_as_file
> --json=/home/binet/work/gonum/src/
> github.com/apache/arrow/integration/data/simple.json --mode=JSON_TO_ARROW
> With output:
> --------------
> Found schema: foo: int32
> bar: double
> baz: string
>
> --------------
> ==========================================================
> Testing file testdata/generated_primitive.json
> ==========================================================
> -- Creating binary inputs
> /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
> --integration --arrow=testdata/8eacb124_generated_primitive.json_as_file
> --json=testdata/generated_primitive.json --mode=JSON_TO_ARROW
> Command failed: /home/binet/work/gonum/src/
> github.com/apache/arrow/integration/../cpp/build/latest/arrow-json-integration-test
> --integration --arrow=testdata/8eacb124_generated_primitive.json_as_file
> --json=testdata/generated_primitive.json --mode=JSON_TO_ARROW
> With output:
> --------------
> Found schema: bool_nullable: bool
> bool_nonnullable: bool not null
> int8_nullable: int8
> int8_nonnullable: int8 not null
> int16_nullable: int16
> int16_nonnullable: int16 not null
> int32_nullable: int32
> int32_nonnullable: int32 not null
> int64_nullable: int64
> int64_nonnullable: int64 not null
> uint8_nullable: uint8
> uint8_nonnullable: uint8 not null
> uint16_nullable: uint16
> uint16_nonnullable: uint16 not null
> uint32_nullable: uint32
> uint32_nonnullable: uint32 not null
> uint64_nullable: uint64
> uint64_nonnullable: uint64 not null
> float32_nullable: float
> float32_nonnullable: float not null
> float64_nullable: double
> float64_nonnullable: double not null
> binary_nullable: binary
> binary_nonnullable: binary not null
> utf8_nullable: string
> utf8_nonnullable: string not null
> fixedsizebinary_19_nullable: fixed_size_binary[19]
> fixedsizebinary_19_nonnullable: fixed_size_binary[19] not null
> fixedsizebinary_120_nullable: fixed_size_binary[120]
> fixedsizebinary_120_nonnullable: fixed_size_binary[120] not null
>
> --------------
>
> is this supposed to work?
> are there reference files already available somewhere?
>
> cheers,
> -s