You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Li Jin <ic...@gmail.com> on 2016/12/07 16:00:46 UTC

What's the best way to construct a arrow record batch for testing/validation in Java?

Hello!

I am trying to test a function that turns a list of some data to a arrow
record batch. In order to do that, I need to compare the output of the
function to a "correct" arrow record batch. However, I struggle with
creating the "correct" arrow record batch.

My test data is a list of rows that each has a integer column "a" with a
null value:
[{a: 1}, {a: 2}, {a: 3}, {a: null}]

Is there a format of data that I can use to turn into arrow record batch
easily (for instance, json?) ?

Re: What's the best way to construct a arrow record batch for testing/validation in Java?

Posted by Wes McKinney <we...@gmail.com>.
That should be fine to move those to static methods in arrow-vector --
Julien or others may have another suggestion. Feel free to submit a PR
(and create a JIRA that corresponds to the patch).

On Wed, Dec 7, 2016 at 5:09 PM, Li Jin <ic...@gmail.com> wrote:
> Hi Wes,
>
> Thank for the help. I got it to work with manually creating a json file
> similar to https://github.com/apache/arrow/blob/master/integration/
> data/simple.json for my test.
>
> I want to use Intergration.compareSchemas and Intergration.compare from
> https://github.com/apache/arrow/blob/master/java/tools/
> src/main/java/org/apache/arrow/tools/Integration.java, but there are
> private. Is it possible to make these functions part of arrow java library?
>
> Li
>
> On Wed, Dec 7, 2016 at 12:43 PM, Wes McKinney <we...@gmail.com> wrote:
>
>> hi Li,
>>
>> This is exactly what we are doing in the integration tests. See the
>> "JSON_TO_ARROW" and "VALIDATE" commands in the Java integration tests:
>>
>> https://github.com/apache/arrow/blob/master/java/tools/
>> src/main/java/org/apache/arrow/tools/Integration.java
>>
>> Here is a sample JSON data file:
>>
>> https://github.com/apache/arrow/blob/master/integration/data/simple.json
>>
>> In my patch in https://github.com/apache/arrow/pull/219 I started
>> creating more comprehensive JSON data generation, so you will be able
>> to more easily generate a JSON file that matches a particular record
>> batch schema.
>>
>> - Wes
>>
>> On Wed, Dec 7, 2016 at 11:00 AM, Li Jin <ic...@gmail.com> wrote:
>> > Hello!
>> >
>> > I am trying to test a function that turns a list of some data to a arrow
>> > record batch. In order to do that, I need to compare the output of the
>> > function to a "correct" arrow record batch. However, I struggle with
>> > creating the "correct" arrow record batch.
>> >
>> > My test data is a list of rows that each has a integer column "a" with a
>> > null value:
>> > [{a: 1}, {a: 2}, {a: 3}, {a: null}]
>> >
>> > Is there a format of data that I can use to turn into arrow record batch
>> > easily (for instance, json?) ?
>>

Re: What's the best way to construct a arrow record batch for testing/validation in Java?

Posted by Li Jin <ic...@gmail.com>.
Hi Wes,

Thank for the help. I got it to work with manually creating a json file
similar to https://github.com/apache/arrow/blob/master/integration/
data/simple.json for my test.

I want to use Intergration.compareSchemas and Intergration.compare from
https://github.com/apache/arrow/blob/master/java/tools/
src/main/java/org/apache/arrow/tools/Integration.java, but there are
private. Is it possible to make these functions part of arrow java library?

Li

On Wed, Dec 7, 2016 at 12:43 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Li,
>
> This is exactly what we are doing in the integration tests. See the
> "JSON_TO_ARROW" and "VALIDATE" commands in the Java integration tests:
>
> https://github.com/apache/arrow/blob/master/java/tools/
> src/main/java/org/apache/arrow/tools/Integration.java
>
> Here is a sample JSON data file:
>
> https://github.com/apache/arrow/blob/master/integration/data/simple.json
>
> In my patch in https://github.com/apache/arrow/pull/219 I started
> creating more comprehensive JSON data generation, so you will be able
> to more easily generate a JSON file that matches a particular record
> batch schema.
>
> - Wes
>
> On Wed, Dec 7, 2016 at 11:00 AM, Li Jin <ic...@gmail.com> wrote:
> > Hello!
> >
> > I am trying to test a function that turns a list of some data to a arrow
> > record batch. In order to do that, I need to compare the output of the
> > function to a "correct" arrow record batch. However, I struggle with
> > creating the "correct" arrow record batch.
> >
> > My test data is a list of rows that each has a integer column "a" with a
> > null value:
> > [{a: 1}, {a: 2}, {a: 3}, {a: null}]
> >
> > Is there a format of data that I can use to turn into arrow record batch
> > easily (for instance, json?) ?
>

Re: What's the best way to construct a arrow record batch for testing/validation in Java?

Posted by Wes McKinney <we...@gmail.com>.
hi Li,

This is exactly what we are doing in the integration tests. See the
"JSON_TO_ARROW" and "VALIDATE" commands in the Java integration tests:

https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/Integration.java

Here is a sample JSON data file:

https://github.com/apache/arrow/blob/master/integration/data/simple.json

In my patch in https://github.com/apache/arrow/pull/219 I started
creating more comprehensive JSON data generation, so you will be able
to more easily generate a JSON file that matches a particular record
batch schema.

- Wes

On Wed, Dec 7, 2016 at 11:00 AM, Li Jin <ic...@gmail.com> wrote:
> Hello!
>
> I am trying to test a function that turns a list of some data to a arrow
> record batch. In order to do that, I need to compare the output of the
> function to a "correct" arrow record batch. However, I struggle with
> creating the "correct" arrow record batch.
>
> My test data is a list of rows that each has a integer column "a" with a
> null value:
> [{a: 1}, {a: 2}, {a: 3}, {a: null}]
>
> Is there a format of data that I can use to turn into arrow record batch
> easily (for instance, json?) ?