You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/05 18:44:12 UTC

[GitHub] [arrow-datafusion] houqp opened a new issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

houqp opened a new issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512


   **Describe the bug**
   
   passing float arrays to `create_hashes` results in panics with a message "not implemented" when built with nightly rustc.
   
   **To Reproduce**
   
   Rustc:
   
   ```
   ✦ at 11:17:43 ❯ rustc --version
   rustc 1.54.0-nightly (b663c0f4f 2021-05-29)
   ```
   
   Code:
   
   ```rust
   let random_state = RandomState::with_seeds(0, 0, 0, 0);
   let schema = Schema::new(vec![Field::new("c1", DataType::Float32, false)]);
   let batch = RecordBatch::try_new(
       Arc::new(schema),
       vec![Arc::new(Float32Array::from(vec![
           0.00005, 0.00002, 0.00003, 0.00001, 0.00004,
       ]))],
   )
   .unwrap();
   
   let hashes_buff = &mut vec![0; batch.num_rows()];
   let hashes = create_hashes(&[batch.columns()[0].clone()], &random_state, hashes_buff)?;
   ```
   
   **Expected behavior**
   
   It should not panic.
   
   **Additional context**
   
   Upstream issue: https://github.com/tkaitchuck/aHash/issues/93
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512#issuecomment-855371911


   > This is probably related to Rust's unability to support hash for f32 and f64, as `Eq` is still not stabilized for them (and Hash and Eq must be consistent).
   
   @houqp found an issue https://github.com/tkaitchuck/aHash/issues/93 in upstream (trying to hash an `usize`). I think it could be solved easily there by casting it to `u64`. It's not totally clear to me what's the difference between nightly and stable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512#issuecomment-855371911


   > This is probably related to Rust's unability to support hash for f32 and f64, as `Eq` is still not stabilized for them (and Hash and Eq must be consistent).
   
   @houqp found an issue https://github.com/tkaitchuck/aHash/issues/93 in upstream (trying to hash an `usize` from a `len()` call). I think it could be solved easily there by casting it to `u64`. It's not totally clear to me what's the difference between nightly and stable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512#issuecomment-855449964


   My CPU is Intel i7.
   
   I think i found what's causing the difference between stable and nightly: https://github.com/tkaitchuck/aHash/issues/93#issuecomment-855449239. This turned on the `specialize` feature in ahash, which triggered the fallback code path.
   
   I think ahash just need to implement usize write in this case like @Dandandan suggested. There is nothing much we could do on our end without incurring extra overhead like converting the byte array to u32 or u64 before hashing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan closed issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

Posted by GitBox <gi...@apache.org>.
Dandandan closed issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512#issuecomment-855370906


   Interesting find @houqp . I think this needs to be fixed upstream, by e.g. casting to .
   Any idea why it's using the fallback code of aHash at your end, what kind of cpu are you using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #512: hash_join.rs's create_hashes function panics with float columns with nightly rustc

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #512:
URL: https://github.com/apache/arrow-datafusion/issues/512#issuecomment-855371497


   This is probably related to Rust's unability to support hash for f32 and f64, as `Eq` is still not stabilized for them (and Hash and Eq must be consistent).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org