You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/07 23:06:19 UTC

[GitHub] [arrow-datafusion] comphead opened a new pull request, #4133: Use f64::total_cmp instead of OrderedFloat

comphead opened a new pull request, #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133

# Which issue does this PR close?

Closes #4051 .

# Rationale for this change
See #4051

# What changes are included in this PR?

# Are there any user-facing changes?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018288403


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }
             }
             (Float32(_), _) => false,
             (Float64(v1), Float64(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }

Review Comment:
   Sounds good, I would recommend using `to_bits` instead of `total_tmp().is_eq()`, as LLVM isn't smart enough to figure out that the latter is just bitwise equality - https://github.com/apache/arrow-rs/pull/3060



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018311725


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +221,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

Review Comment:
   Yes, we will need to unwrap the option, I keep forgetting that ScalarValue has typed nulls for some reason :laughing: 
   
   partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#issuecomment-1308059460

   > @comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...
   
   Hi @tustvold I have implemented hasher through `std::hash` but the impl is the same as in hash_utils. HashValue trait is not the same as Hash, afaik. Let me know if the hasher should be done in other way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017439212


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat

Review Comment:
   To be consistent with the hash implementation, this should also use total_cmp. Otherwise two equal values, e.g. +0 and -0, will have different hashes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017439978


##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -748,10 +696,7 @@ mod tests {
     #[test]
     fn test_merge_unsorted_against_uniform_distro() {
         let t = TDigest::new(100);
-        let values: Vec<_> = (1..=1_000_000)
-            .map(f64::from)
-            .map(|v| OrderedFloat::from(v as f64))
-            .collect();
+        let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();

Review Comment:
   ```suggestion
           let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017443720


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }
             }
             (Float32(_), _) => false,
             (Float64(v1), Float64(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }

Review Comment:
   ```suggestion
                  v1.to_bits() == v2.to_bits()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017444159


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }

Review Comment:
   ```suggestion
                   v1.to_bits() == v2.to_bits()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017439867


##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -788,10 +728,7 @@ mod tests {
 
         for _ in 1..=100 {
             let t = TDigest::new(100);
-            let values: Vec<_> = (1..=1_000)
-                .map(f64::from)
-                .map(|v| OrderedFloat::from(v as f64))
-                .collect();
+            let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();

Review Comment:
   ```suggestion
               let values: Vec<_> = (1..=1_000).map(f64::from).collect();
   ```



##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -766,13 +711,8 @@ mod tests {
     #[test]
     fn test_merge_unsorted_against_skewed_distro() {
         let t = TDigest::new(100);
-        let mut values: Vec<_> = (1..=600_000)
-            .map(f64::from)
-            .map(|v| OrderedFloat::from(v as f64))
-            .collect();
-        for _ in 0..400_000 {
-            values.push(OrderedFloat::from(1_000_000_f64));
-        }
+        let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();

Review Comment:
   ```suggestion
           let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018285802


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }
             }
             (Float32(_), _) => false,
             (Float64(v1), Float64(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }

Review Comment:
   v1 is `Option<f32>`
   what if 
   ```
                   match (v1, v2) {
                       (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(),
                       _ => v1.eq(v2),
                   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018311725


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +221,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

Review Comment:
   Yes, we will need to match the option, I keep forgetting that ScalarValue has typed nulls for some reason :laughing: 
   
   partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018314133


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat
+                match (v1, v2) {
+                    (Some(f1), Some(f2)) if f1.is_nan() && f2.is_nan() => true,
+                    _ => v1.eq(v2),
+                }

Review Comment:
   v1 is `Option<f32>`
   what if
   ```
                   match (v1, v2) {
                       (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(),
                       _ => v1.eq(v2),
                   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018286902


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat

Review Comment:
   What if 
   ```
                   match (v1, v2) {
                       (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(),
                       _ => v1.eq(v2),
                   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018558193


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +215,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)),

Review Comment:
   I think this will now return None when comparing nulls, which isn't consistent with the other types



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017439212


##########
datafusion/common/src/scalar.rs:
##########
@@ -132,15 +129,19 @@ impl PartialEq for ScalarValue {
             (Boolean(v1), Boolean(v2)) => v1.eq(v2),
             (Boolean(_), _) => false,
             (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat

Review Comment:
   To be consistent with the hash implementation, this should also use total_cmp. Otherwise two "equal" values according to PartialEq, e.g. +0 and -0, will have different hashes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017438172


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +221,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

Review Comment:
   ```suggestion
               (Float32(v1), Float32(v2)) => v1.total_cmp(v2),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

ursabot commented on PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#issuecomment-1310811262

   Benchmark runs are scheduled for baseline = 509c82c6d624bb63531f67531195b562a241c854 and contender = 5883e43db6c16d3ac3616302606849abbfbc86eb. 5883e43db6c16d3ac3616302606849abbfbc86eb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/22ba783004d447e5a0410a45da93f10f...fc52936204ef4b0584c5eeae9395f2ed/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/85f7a4ada6044257a8c73414ca59e9a2...aa495b138605447cb470b6465247ef91/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a43c4fc2b151421c83656ed3d91c4c5f...f6f74b6418e6495e885fcbdda856ee61/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c8b5729573c4440d8c1ab415b7ddbc4f...29455a47c1d541bf8f89c7004ae8feb5/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018296999


##########
datafusion/common/src/scalar.rs:
##########
@@ -637,14 +643,8 @@ impl std::hash::Hash for ScalarValue {
                 s.hash(state)
             }
             Boolean(v) => v.hash(state),
-            Float32(v) => {
-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
-            Float64(v) => {
-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
+            Float32(v) => v.map(Fl).hash(state),

Review Comment:
   v is `Option<f32>` is supports Hash, but we have to wrap f32 into some wrapper supporting hash. `Fl` in this case, I didn't find how to implement `Hash` directly on `f32/f64`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018512984


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +221,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

Review Comment:
   Done. Yeah, I checked that Float64(NULL) == Float64(NULL) now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#issuecomment-1306477153

   @comphead I would recommend creating a newtype wrapper around a num::Float that implements Hash using hash_utils, eq using total_cmp, etc...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018282662


##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -788,10 +728,7 @@ mod tests {
 
         for _ in 1..=100 {
             let t = TDigest::new(100);
-            let values: Vec<_> = (1..=1_000)
-                .map(f64::from)
-                .map(|v| OrderedFloat::from(v as f64))
-                .collect();
+            let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();

Review Comment:
   Done



##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -766,13 +711,8 @@ mod tests {
     #[test]
     fn test_merge_unsorted_against_skewed_distro() {
         let t = TDigest::new(100);
-        let mut values: Vec<_> = (1..=600_000)
-            .map(f64::from)
-            .map(|v| OrderedFloat::from(v as f64))
-            .collect();
-        for _ in 0..400_000 {
-            values.push(OrderedFloat::from(1_000_000_f64));
-        }
+        let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();

Review Comment:
   Done



##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -748,10 +696,7 @@ mod tests {
     #[test]
     fn test_merge_unsorted_against_uniform_distro() {
         let t = TDigest::new(100);
-        let values: Vec<_> = (1..=1_000_000)
-            .map(f64::from)
-            .map(|v| OrderedFloat::from(v as f64))
-            .collect();
+        let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018286506


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +221,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

Review Comment:
   v1 is an `Option<f32>`, it supports partial_cmp, not total_cmp. let me know if you ok if I unwrap it to floats, the same way as done for Decimals.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018312577


##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -658,7 +608,7 @@ impl TDigest {
 
         let max = cast_scalar_f64!(&state[3]);
         let min = cast_scalar_f64!(&state[4]);
-        assert!(max >= min);
+        assert!(max.is_nan() || min.is_nan() || max >= min);

Review Comment:
   max/min can be none, direct f64 comparison works not correctly in this case. replaced with `assert(max.total_cmp(&min).is_ge());`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017438854


##########
datafusion/common/src/scalar.rs:
##########
@@ -637,14 +643,8 @@ impl std::hash::Hash for ScalarValue {
                 s.hash(state)
             }
             Boolean(v) => v.hash(state),
-            Float32(v) => {
-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
-            Float64(v) => {
-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
+            Float32(v) => v.map(Fl).hash(state),

Review Comment:
   I think this can just call HashValue on v?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#issuecomment-1306344642

   @tustvold I have replaced almost all entries OrderedFloat to f64. Still thinking how to use you hasher to remove OrderedFloat from Hash. 
   As your trait implement HashValue and ScalarValue requires std::cmp::Hash


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1017440216


##########
datafusion/physical-expr/src/aggregate/tdigest.rs:
##########
@@ -658,7 +608,7 @@ impl TDigest {
 
         let max = cast_scalar_f64!(&state[3]);
         let min = cast_scalar_f64!(&state[4]);
-        assert!(max >= min);
+        assert!(max.is_nan() || min.is_nan() || max >= min);

Review Comment:
   Why this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018314743


##########
datafusion/common/src/scalar.rs:
##########
@@ -637,14 +643,8 @@ impl std::hash::Hash for ScalarValue {
                 s.hash(state)
             }
             Boolean(v) => v.hash(state),
-            Float32(v) => {
-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
-            Float64(v) => {
-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
+            Float32(v) => v.map(Fl).hash(state),

Review Comment:
   Fair, I think there is a way to clean this up, but we can do that in a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

comphead commented on code in PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133#discussion_r1018660640


##########
datafusion/common/src/scalar.rs:
##########
@@ -221,17 +215,9 @@ impl PartialOrd for ScalarValue {
             (Decimal128(_, _, _), _) => None,
             (Boolean(v1), Boolean(v2)) => v1.partial_cmp(v2),
             (Boolean(_), _) => None,
-            (Float32(v1), Float32(v2)) => {
-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)),

Review Comment:
   Right! Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold merged pull request #4133: Use f64::total_cmp instead of OrderedFloat

Posted by GitBox <gi...@apache.org>.

tustvold merged PR #4133:
URL: https://github.com/apache/arrow-datafusion/pull/4133


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org