You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Neville Dipale <ne...@gmail.com> on 2019/11/16 06:42:59 UTC

[Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville

Re: [Help Needed] Arrow IPC Reader in Rust

Posted by Neville Dipale <ne...@gmail.com>.
Thanks Paddy,

I've left the fix as is for now. If we come across problems with the
precision in the future, we can tweak it accordingly.

Interesting that the array printing was a separate issue, I would have
never figured it out.

On Mon, 18 Nov 2019 at 22:07, paddy horan <pa...@hotmail.com> wrote:

> I should have mentioned that I pushed the fix to your branch.
>
> P
> ________________________________
> From: paddy horan <pa...@hotmail.com>
> Sent: Monday, November 18, 2019 3:04 PM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [Help Needed] Arrow IPC Reader in Rust
>
> Hey Neville,
>
> I had a chance to look at this.  The debugging output is a separate, but
> misleading, issue.  The real cause is the precision of the 32-bit floating
> point values.  The JSON data has 3 decimal places and the array returned
> from the reader has more than 3, this might be due to the fact that we read
> in 64-bit floats and cast?
>
> I implemented a quick fix to test and I can pass all tests locally,
> although I will leave it to you to change as I'm not sure where in your
> process it's best to adjust the precision.
>
> Regards,
> Paddy
> ________________________________
> From: paddy horan <pa...@hotmail.com>
> Sent: Saturday, November 16, 2019 1:03 PM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: Re: [Help Needed] Arrow IPC Reader in Rust
>
> Hey Neville,
>
> I'll take a look if no-one beats me to it (I might not have time today or
> tomorrow).
>
> P
>
> ________________________________
> From: Neville Dipale <ne...@gmail.com>
> Sent: Saturday, November 16, 2019 1:42 AM
> To: dev@arrow.apache.org <de...@arrow.apache.org>
> Subject: [Help Needed] Arrow IPC Reader in Rust
>
> Hi Arrow developers,
>
> I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
> but am having issues with reading some of the test data.
> Specifically, I've noticed that when reading the integration test data
> (primitve_generated), where I expect an array with 17 values, the arrow
> array contains 20 values.
>
> To illustrate what's happening, I've added some debug statements to the
> unit test, and the behaviour can be seen at (
> https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
> In the logs, there are a number of arrays which have a length of 17, but
> have 20 printed values. 3 of those values are duplicated.
>
> It's been hard trying to inspect the binary data to verify if there's an
> issue with them, and I'm able to correctly read 17 values with Python, so I
> suspect it has to be a Rust issue.
> Would anyone have some time to look into this with me?
>
> Thanks
> Neville
>

Re: [Help Needed] Arrow IPC Reader in Rust

Posted by paddy horan <pa...@hotmail.com>.
I should have mentioned that I pushed the fix to your branch.

P
________________________________
From: paddy horan <pa...@hotmail.com>
Sent: Monday, November 18, 2019 3:04 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [Help Needed] Arrow IPC Reader in Rust

Hey Neville,

I had a chance to look at this.  The debugging output is a separate, but misleading, issue.  The real cause is the precision of the 32-bit floating point values.  The JSON data has 3 decimal places and the array returned from the reader has more than 3, this might be due to the fact that we read in 64-bit floats and cast?

I implemented a quick fix to test and I can pass all tests locally, although I will leave it to you to change as I'm not sure where in your process it's best to adjust the precision.

Regards,
Paddy
________________________________
From: paddy horan <pa...@hotmail.com>
Sent: Saturday, November 16, 2019 1:03 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [Help Needed] Arrow IPC Reader in Rust

Hey Neville,

I'll take a look if no-one beats me to it (I might not have time today or tomorrow).

P

________________________________
From: Neville Dipale <ne...@gmail.com>
Sent: Saturday, November 16, 2019 1:42 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: [Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville

Re: [Help Needed] Arrow IPC Reader in Rust

Posted by paddy horan <pa...@hotmail.com>.
Hey Neville,

I had a chance to look at this.  The debugging output is a separate, but misleading, issue.  The real cause is the precision of the 32-bit floating point values.  The JSON data has 3 decimal places and the array returned from the reader has more than 3, this might be due to the fact that we read in 64-bit floats and cast?

I implemented a quick fix to test and I can pass all tests locally, although I will leave it to you to change as I'm not sure where in your process it's best to adjust the precision.

Regards,
Paddy
________________________________
From: paddy horan <pa...@hotmail.com>
Sent: Saturday, November 16, 2019 1:03 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [Help Needed] Arrow IPC Reader in Rust

Hey Neville,

I'll take a look if no-one beats me to it (I might not have time today or tomorrow).

P

________________________________
From: Neville Dipale <ne...@gmail.com>
Sent: Saturday, November 16, 2019 1:42 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: [Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville

Re: [Help Needed] Arrow IPC Reader in Rust

Posted by paddy horan <pa...@hotmail.com>.
Hey Neville,

I'll take a look if no-one beats me to it (I might not have time today or tomorrow).

P

________________________________
From: Neville Dipale <ne...@gmail.com>
Sent: Saturday, November 16, 2019 1:42 AM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: [Help Needed] Arrow IPC Reader in Rust

Hi Arrow developers,

I'm "done" with the Arrow IPC Reader in Rust (for supported data types),
but am having issues with reading some of the test data.
Specifically, I've noticed that when reading the integration test data
(primitve_generated), where I expect an array with 17 values, the arrow
array contains 20 values.

To illustrate what's happening, I've added some debug statements to the
unit test, and the behaviour can be seen at (
https://ci.ursalabs.org/#/builders/93/builds/1550/steps/3/logs/stdio).
In the logs, there are a number of arrays which have a length of 17, but
have 20 printed values. 3 of those values are duplicated.

It's been hard trying to inspect the binary data to verify if there's an
issue with them, and I'm able to correctly read 17 values with Python, so I
suspect it has to be a Rust issue.
Would anyone have some time to look into this with me?

Thanks
Neville