You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/09 17:59:14 UTC

[GitHub] [arrow] maxburke opened a new pull request #7693: Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

maxburke opened a new pull request #7693:
URL: https://github.com/apache/arrow/pull/7693


   This issue also applies to 32-bit integers miscast to 64-bit integers,
   however on little endian machines the cast + truncation results in a
   correct value.
   
   In the float32 case, the mis-read 64-bit float is denormal and so when
   it's converted to 32-bit it becomes zero.
   
   This only occurs in the case where there's one (4-byte) element.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656917911


   > @nevi-me When I was testing it we weren't seeing it while reading the record batch back directly; we had to complete the serialization to a byte stream and then read it back.
   
   Yes, writing to a file stream, and reading that back, works. Try the below:
   
   ```rust
       #[test]
       fn test_arrow_single_float_row() {
           let schema = Schema::new(vec![
               Field::new("a", DataType::Float32, false),
               Field::new("b", DataType::Float32, false),
               Field::new("c", DataType::Int32, false),
               Field::new("d", DataType::Int32, false),
           ]);
           let arrays = vec![
               Arc::new(Float32Array::from(vec![1.23])) as ArrayRef,
               Arc::new(Float32Array::from(vec![-6.50])) as ArrayRef,
               Arc::new(Int32Array::from(vec![2])) as ArrayRef,
               Arc::new(Int32Array::from(vec![1])) as ArrayRef,
           ];
           let batch = RecordBatch::try_new(Arc::new(schema.clone()), arrays).unwrap();
           // create stream writer
           let file = File::create("target/debug/testdata/float.stream").unwrap();
           let mut stream_writer = StreamWriter::try_new(file, &schema).unwrap();
           stream_writer.write(&batch).unwrap();
           stream_writer.finish().unwrap();
   
           // read stream back
           let file = File::open("target/debug/testdata/float.stream").unwrap();
           let mut reader = StreamReader::try_new(file).unwrap();
   
           let read_batch = reader.next_batch().unwrap().unwrap();
           // TODO: test the output 
       }
   ```
   
   I was able to reproduce the 0.0 values without your fix, and the correct numbers with the fix.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656271916


   https://issues.apache.org/jira/browse/ARROW-9391


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #7693:
URL: https://github.com/apache/arrow/pull/7693


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-657473066


   Hey @maxburke, may you please kindly rebase this. I can't push to urbanlogiq's fork, access rights issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] maxburke commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
maxburke commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656774372


   I don't know what's happening in the Travis build for it to indicate failure. Is it a relevant failure?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] maxburke commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
maxburke commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-657764188


   @nevi-me done!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656859641


   Hey @maxburke, I think I understand the gist of the problem, but do you need to create the `ParquetBufWriter` struct to test this? Wouldn't writing a single record batch with the arrays to a stream, then reading it back, work? Or am I oversimplifying the problem?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] maxburke commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

Posted by GitBox <gi...@apache.org>.
maxburke commented on pull request #7693:
URL: https://github.com/apache/arrow/pull/7693#issuecomment-656908743


   @nevi-me When I was testing it we weren't seeing it while reading the record batch back directly; we had to complete the serialization to a byte stream and then read it back.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org