You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2016/11/21 22:05:45 UTC

Java-C++ integration tests -- on the home stretch

hi folks,

After a long road, we're getting very close to having tests proving
that the Java and C++ Arrow implementations are binary compatible --
this will be an exciting major milestone for the project. If you
haven't been following along recent JIRAs, the way these tests work is
as follows:

1) Testing dataset is specified in JSON format

2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
writes out to an Arrow file IPC binary format

3) Consumer library (e.g. C++) attempts to read both the JSON and the
binary file yielded by the producer library. The consumer compares the
in-memory schemas and columnar data structures and indicates whether
they are binary-identical

I found a couple initial incompatibilities in the file format
implementations, cited here:
https://github.com/apache/arrow/pull/211#issuecomment-262080545.

Thanks
Wes

Re: Java-C++ integration tests -- on the home stretch

Posted by Julien Le Dem <ju...@dremio.com>.
Great!
Looking at your PR and opening JIRAs.

On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> After a long road, we're getting very close to having tests proving
> that the Java and C++ Arrow implementations are binary compatible --
> this will be an exciting major milestone for the project. If you
> haven't been following along recent JIRAs, the way these tests work is
> as follows:
>
> 1) Testing dataset is specified in JSON format
>
> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
> writes out to an Arrow file IPC binary format
>
> 3) Consumer library (e.g. C++) attempts to read both the JSON and the
> binary file yielded by the producer library. The consumer compares the
> in-memory schemas and columnar data structures and indicates whether
> they are binary-identical
>
> I found a couple initial incompatibilities in the file format
> implementations, cited here:
> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
>
> Thanks
> Wes
>



-- 
Julien

Re: Java-C++ integration tests -- on the home stretch

Posted by Bryan Cutler <cu...@gmail.com>.
This is great, nice job guys!

On Fri, Dec 9, 2016 at 2:15 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Congrats guys, great work!
>
> On Fri, Dec 9, 2016 at 2:14 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > Woot!
> > 🎉
> >
> > On Fri, Dec 9, 2016 at 2:07 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > We just got the integration test suite (binary compatibility between
> > > Java and C++) passing in Travis CI today!
> > >
> > > https://travis-ci.org/wesm/arrow/builds/182725476
> > >
> > > Big team effort, congrats on all the hard work!
> > >
> > > On Fri, Dec 2, 2016 at 10:50 AM, Wes McKinney <we...@gmail.com>
> > wrote:
> > > > We're close to having the integration tests all passing -- Julien and
> > > > I have been hammering out the lingering nuances between the Java and
> > > > C++ implementations. There's a number of JIRAs remaining linked to
> > > > from this issue:
> > > >
> > > > https://github.com/apache/arrow/pull/219
> > > >
> > > > On Mon, Nov 21, 2016 at 8:55 PM, Wes McKinney <we...@gmail.com>
> > > wrote:
> > > >> hey Ted
> > > >>
> > > >> On Mon, Nov 21, 2016 at 8:20 PM, Ted Dunning <ted.dunning@gmail.com
> >
> > > wrote:
> > > >>> Wes,
> > > >>>
> > > >>> This is awesome.
> > > >>>
> > > >>> Does it, however, imply that to run the tests that a C programmer
> > will
> > > need
> > > >>> a working Java environment and a Java programmer will need a C
> > > environment?
> > > >>>
> > > >>> Is there any way around that? Possibly by storing golden bits for
> the
> > > >>> in-memory images somewhere?
> > > >>>
> > > >>
> > > >> Easiest thing would be to create a Dockerfile for experimentation --
> > > >> this would be useful for benchmarking on different hardware
> > > >> environments as well. We'll want to run the integration tests either
> > > >> in Travis CI or Circle CI anyway (right now we have the Java and
> > > >> C++/Python unit tests running in separate build setups in Travis
> CI),
> > > >> so it hopefully wouldn't be a great deal of additional effort to put
> > > >> everything into a container recipe.
> > > >>
> > > >> Wes
> > > >>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <wesmckinn@gmail.com
> >
> > > wrote:
> > > >>>
> > > >>>> hi folks,
> > > >>>>
> > > >>>> After a long road, we're getting very close to having tests
> proving
> > > >>>> that the Java and C++ Arrow implementations are binary compatible
> --
> > > >>>> this will be an exciting major milestone for the project. If you
> > > >>>> haven't been following along recent JIRAs, the way these tests
> work
> > is
> > > >>>> as follows:
> > > >>>>
> > > >>>> 1) Testing dataset is specified in JSON format
> > > >>>>
> > > >>>> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory,
> > then
> > > >>>> writes out to an Arrow file IPC binary format
> > > >>>>
> > > >>>> 3) Consumer library (e.g. C++) attempts to read both the JSON and
> > the
> > > >>>> binary file yielded by the producer library. The consumer compares
> > the
> > > >>>> in-memory schemas and columnar data structures and indicates
> whether
> > > >>>> they are binary-identical
> > > >>>>
> > > >>>> I found a couple initial incompatibilities in the file format
> > > >>>> implementations, cited here:
> > > >>>> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
> > > >>>>
> > > >>>> Thanks
> > > >>>> Wes
> > > >>>>
> > >
> >
> >
> >
> > --
> > Julien
> >
>

Re: Java-C++ integration tests -- on the home stretch

Posted by Jason Altekruse <al...@gmail.com>.
Congrats guys, great work!

On Fri, Dec 9, 2016 at 2:14 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Woot!
> 🎉
>
> On Fri, Dec 9, 2016 at 2:07 PM, Wes McKinney <we...@gmail.com> wrote:
>
> > We just got the integration test suite (binary compatibility between
> > Java and C++) passing in Travis CI today!
> >
> > https://travis-ci.org/wesm/arrow/builds/182725476
> >
> > Big team effort, congrats on all the hard work!
> >
> > On Fri, Dec 2, 2016 at 10:50 AM, Wes McKinney <we...@gmail.com>
> wrote:
> > > We're close to having the integration tests all passing -- Julien and
> > > I have been hammering out the lingering nuances between the Java and
> > > C++ implementations. There's a number of JIRAs remaining linked to
> > > from this issue:
> > >
> > > https://github.com/apache/arrow/pull/219
> > >
> > > On Mon, Nov 21, 2016 at 8:55 PM, Wes McKinney <we...@gmail.com>
> > wrote:
> > >> hey Ted
> > >>
> > >> On Mon, Nov 21, 2016 at 8:20 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >>> Wes,
> > >>>
> > >>> This is awesome.
> > >>>
> > >>> Does it, however, imply that to run the tests that a C programmer
> will
> > need
> > >>> a working Java environment and a Java programmer will need a C
> > environment?
> > >>>
> > >>> Is there any way around that? Possibly by storing golden bits for the
> > >>> in-memory images somewhere?
> > >>>
> > >>
> > >> Easiest thing would be to create a Dockerfile for experimentation --
> > >> this would be useful for benchmarking on different hardware
> > >> environments as well. We'll want to run the integration tests either
> > >> in Travis CI or Circle CI anyway (right now we have the Java and
> > >> C++/Python unit tests running in separate build setups in Travis CI),
> > >> so it hopefully wouldn't be a great deal of additional effort to put
> > >> everything into a container recipe.
> > >>
> > >> Wes
> > >>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com>
> > wrote:
> > >>>
> > >>>> hi folks,
> > >>>>
> > >>>> After a long road, we're getting very close to having tests proving
> > >>>> that the Java and C++ Arrow implementations are binary compatible --
> > >>>> this will be an exciting major milestone for the project. If you
> > >>>> haven't been following along recent JIRAs, the way these tests work
> is
> > >>>> as follows:
> > >>>>
> > >>>> 1) Testing dataset is specified in JSON format
> > >>>>
> > >>>> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory,
> then
> > >>>> writes out to an Arrow file IPC binary format
> > >>>>
> > >>>> 3) Consumer library (e.g. C++) attempts to read both the JSON and
> the
> > >>>> binary file yielded by the producer library. The consumer compares
> the
> > >>>> in-memory schemas and columnar data structures and indicates whether
> > >>>> they are binary-identical
> > >>>>
> > >>>> I found a couple initial incompatibilities in the file format
> > >>>> implementations, cited here:
> > >>>> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
> > >>>>
> > >>>> Thanks
> > >>>> Wes
> > >>>>
> >
>
>
>
> --
> Julien
>

Re: Java-C++ integration tests -- on the home stretch

Posted by Julien Le Dem <ju...@dremio.com>.
Woot!
🎉

On Fri, Dec 9, 2016 at 2:07 PM, Wes McKinney <we...@gmail.com> wrote:

> We just got the integration test suite (binary compatibility between
> Java and C++) passing in Travis CI today!
>
> https://travis-ci.org/wesm/arrow/builds/182725476
>
> Big team effort, congrats on all the hard work!
>
> On Fri, Dec 2, 2016 at 10:50 AM, Wes McKinney <we...@gmail.com> wrote:
> > We're close to having the integration tests all passing -- Julien and
> > I have been hammering out the lingering nuances between the Java and
> > C++ implementations. There's a number of JIRAs remaining linked to
> > from this issue:
> >
> > https://github.com/apache/arrow/pull/219
> >
> > On Mon, Nov 21, 2016 at 8:55 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >> hey Ted
> >>
> >> On Mon, Nov 21, 2016 at 8:20 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>> Wes,
> >>>
> >>> This is awesome.
> >>>
> >>> Does it, however, imply that to run the tests that a C programmer will
> need
> >>> a working Java environment and a Java programmer will need a C
> environment?
> >>>
> >>> Is there any way around that? Possibly by storing golden bits for the
> >>> in-memory images somewhere?
> >>>
> >>
> >> Easiest thing would be to create a Dockerfile for experimentation --
> >> this would be useful for benchmarking on different hardware
> >> environments as well. We'll want to run the integration tests either
> >> in Travis CI or Circle CI anyway (right now we have the Java and
> >> C++/Python unit tests running in separate build setups in Travis CI),
> >> so it hopefully wouldn't be a great deal of additional effort to put
> >> everything into a container recipe.
> >>
> >> Wes
> >>
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >>>
> >>>> hi folks,
> >>>>
> >>>> After a long road, we're getting very close to having tests proving
> >>>> that the Java and C++ Arrow implementations are binary compatible --
> >>>> this will be an exciting major milestone for the project. If you
> >>>> haven't been following along recent JIRAs, the way these tests work is
> >>>> as follows:
> >>>>
> >>>> 1) Testing dataset is specified in JSON format
> >>>>
> >>>> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
> >>>> writes out to an Arrow file IPC binary format
> >>>>
> >>>> 3) Consumer library (e.g. C++) attempts to read both the JSON and the
> >>>> binary file yielded by the producer library. The consumer compares the
> >>>> in-memory schemas and columnar data structures and indicates whether
> >>>> they are binary-identical
> >>>>
> >>>> I found a couple initial incompatibilities in the file format
> >>>> implementations, cited here:
> >>>> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
> >>>>
> >>>> Thanks
> >>>> Wes
> >>>>
>



-- 
Julien

Re: Java-C++ integration tests -- on the home stretch

Posted by Wes McKinney <we...@gmail.com>.
We just got the integration test suite (binary compatibility between
Java and C++) passing in Travis CI today!

https://travis-ci.org/wesm/arrow/builds/182725476

Big team effort, congrats on all the hard work!

On Fri, Dec 2, 2016 at 10:50 AM, Wes McKinney <we...@gmail.com> wrote:
> We're close to having the integration tests all passing -- Julien and
> I have been hammering out the lingering nuances between the Java and
> C++ implementations. There's a number of JIRAs remaining linked to
> from this issue:
>
> https://github.com/apache/arrow/pull/219
>
> On Mon, Nov 21, 2016 at 8:55 PM, Wes McKinney <we...@gmail.com> wrote:
>> hey Ted
>>
>> On Mon, Nov 21, 2016 at 8:20 PM, Ted Dunning <te...@gmail.com> wrote:
>>> Wes,
>>>
>>> This is awesome.
>>>
>>> Does it, however, imply that to run the tests that a C programmer will need
>>> a working Java environment and a Java programmer will need a C environment?
>>>
>>> Is there any way around that? Possibly by storing golden bits for the
>>> in-memory images somewhere?
>>>
>>
>> Easiest thing would be to create a Dockerfile for experimentation --
>> this would be useful for benchmarking on different hardware
>> environments as well. We'll want to run the integration tests either
>> in Travis CI or Circle CI anyway (right now we have the Java and
>> C++/Python unit tests running in separate build setups in Travis CI),
>> so it hopefully wouldn't be a great deal of additional effort to put
>> everything into a container recipe.
>>
>> Wes
>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com> wrote:
>>>
>>>> hi folks,
>>>>
>>>> After a long road, we're getting very close to having tests proving
>>>> that the Java and C++ Arrow implementations are binary compatible --
>>>> this will be an exciting major milestone for the project. If you
>>>> haven't been following along recent JIRAs, the way these tests work is
>>>> as follows:
>>>>
>>>> 1) Testing dataset is specified in JSON format
>>>>
>>>> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
>>>> writes out to an Arrow file IPC binary format
>>>>
>>>> 3) Consumer library (e.g. C++) attempts to read both the JSON and the
>>>> binary file yielded by the producer library. The consumer compares the
>>>> in-memory schemas and columnar data structures and indicates whether
>>>> they are binary-identical
>>>>
>>>> I found a couple initial incompatibilities in the file format
>>>> implementations, cited here:
>>>> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
>>>>
>>>> Thanks
>>>> Wes
>>>>

Re: Java-C++ integration tests -- on the home stretch

Posted by Wes McKinney <we...@gmail.com>.
We're close to having the integration tests all passing -- Julien and
I have been hammering out the lingering nuances between the Java and
C++ implementations. There's a number of JIRAs remaining linked to
from this issue:

https://github.com/apache/arrow/pull/219

On Mon, Nov 21, 2016 at 8:55 PM, Wes McKinney <we...@gmail.com> wrote:
> hey Ted
>
> On Mon, Nov 21, 2016 at 8:20 PM, Ted Dunning <te...@gmail.com> wrote:
>> Wes,
>>
>> This is awesome.
>>
>> Does it, however, imply that to run the tests that a C programmer will need
>> a working Java environment and a Java programmer will need a C environment?
>>
>> Is there any way around that? Possibly by storing golden bits for the
>> in-memory images somewhere?
>>
>
> Easiest thing would be to create a Dockerfile for experimentation --
> this would be useful for benchmarking on different hardware
> environments as well. We'll want to run the integration tests either
> in Travis CI or Circle CI anyway (right now we have the Java and
> C++/Python unit tests running in separate build setups in Travis CI),
> so it hopefully wouldn't be a great deal of additional effort to put
> everything into a container recipe.
>
> Wes
>
>>
>>
>>
>>
>> On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi folks,
>>>
>>> After a long road, we're getting very close to having tests proving
>>> that the Java and C++ Arrow implementations are binary compatible --
>>> this will be an exciting major milestone for the project. If you
>>> haven't been following along recent JIRAs, the way these tests work is
>>> as follows:
>>>
>>> 1) Testing dataset is specified in JSON format
>>>
>>> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
>>> writes out to an Arrow file IPC binary format
>>>
>>> 3) Consumer library (e.g. C++) attempts to read both the JSON and the
>>> binary file yielded by the producer library. The consumer compares the
>>> in-memory schemas and columnar data structures and indicates whether
>>> they are binary-identical
>>>
>>> I found a couple initial incompatibilities in the file format
>>> implementations, cited here:
>>> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
>>>
>>> Thanks
>>> Wes
>>>

Re: Java-C++ integration tests -- on the home stretch

Posted by Wes McKinney <we...@gmail.com>.
hey Ted

On Mon, Nov 21, 2016 at 8:20 PM, Ted Dunning <te...@gmail.com> wrote:
> Wes,
>
> This is awesome.
>
> Does it, however, imply that to run the tests that a C programmer will need
> a working Java environment and a Java programmer will need a C environment?
>
> Is there any way around that? Possibly by storing golden bits for the
> in-memory images somewhere?
>

Easiest thing would be to create a Dockerfile for experimentation --
this would be useful for benchmarking on different hardware
environments as well. We'll want to run the integration tests either
in Travis CI or Circle CI anyway (right now we have the Java and
C++/Python unit tests running in separate build setups in Travis CI),
so it hopefully wouldn't be a great deal of additional effort to put
everything into a container recipe.

Wes

>
>
>
>
> On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com> wrote:
>
>> hi folks,
>>
>> After a long road, we're getting very close to having tests proving
>> that the Java and C++ Arrow implementations are binary compatible --
>> this will be an exciting major milestone for the project. If you
>> haven't been following along recent JIRAs, the way these tests work is
>> as follows:
>>
>> 1) Testing dataset is specified in JSON format
>>
>> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
>> writes out to an Arrow file IPC binary format
>>
>> 3) Consumer library (e.g. C++) attempts to read both the JSON and the
>> binary file yielded by the producer library. The consumer compares the
>> in-memory schemas and columnar data structures and indicates whether
>> they are binary-identical
>>
>> I found a couple initial incompatibilities in the file format
>> implementations, cited here:
>> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
>>
>> Thanks
>> Wes
>>

Re: Java-C++ integration tests -- on the home stretch

Posted by Ted Dunning <te...@gmail.com>.
Wes,

This is awesome.

Does it, however, imply that to run the tests that a C programmer will need
a working Java environment and a Java programmer will need a C environment?

Is there any way around that? Possibly by storing golden bits for the
in-memory images somewhere?





On Mon, Nov 21, 2016 at 2:05 PM, Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> After a long road, we're getting very close to having tests proving
> that the Java and C++ Arrow implementations are binary compatible --
> this will be an exciting major milestone for the project. If you
> haven't been following along recent JIRAs, the way these tests work is
> as follows:
>
> 1) Testing dataset is specified in JSON format
>
> 2) Producer library (e.g. Java) reads JSON into Arrow in-memory, then
> writes out to an Arrow file IPC binary format
>
> 3) Consumer library (e.g. C++) attempts to read both the JSON and the
> binary file yielded by the producer library. The consumer compares the
> in-memory schemas and columnar data structures and indicates whether
> they are binary-identical
>
> I found a couple initial incompatibilities in the file format
> implementations, cited here:
> https://github.com/apache/arrow/pull/211#issuecomment-262080545.
>
> Thanks
> Wes
>