You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/10/20 15:19:18 UTC

[PR] Minor: Clarify and write some more tests for boolean interval handling [arrow-datafusion]

alamb opened a new pull request, #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885

   ## Which issue does this PR close?
   
   Related to https://github.com/apache/arrow-datafusion/issues/7883
   
   ## Rationale for this change
   
   I was very confused about what was happening and how open/closed boolean intervals were handled while working on https://github.com/apache/arrow-datafusion/issues/7883
   
   ## What changes are included in this PR?
   
   1. Update documentation with some additional rationale
   2. Add a test that shows how Boolean intervals are handled with the different combinations of open/closed intervals)
   
   ## Are these changes tested?
   Yes
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367138171


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -235,18 +247,16 @@ impl Display for Interval {
 impl Interval {
     /// Creates a new interval object using the given bounds.
     ///
-    /// # Boolean intervals need special handling
+    /// As explained in [`Interval]` boolean `Interval`s are special and this

Review Comment:
   Given that there are only three valid boolean intervals, and `Interval::new()` normalizes to one of those, it might make sense to make the fields of `Interval` non `pub` so that it is not possible to construct an invalid Interval



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1089,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![

Review Comment:
   I was really confused about what the expected Intervals were, so I added test cases. it would be great if someone could double check these



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -194,6 +194,18 @@ impl Display for IntervalBound {
 /// called *unbounded* endpoint and represented using a `NULL` and written using
 /// `∞`.
 ///
+/// # Boolean Handling
+///
+/// Boolean values require special handling.
+///
+/// Given there are only two  boolean values, and they are ordered such that
+/// `false` is less than `true`, there are only three possible valid intervals
+/// for a boolean `[false, false]`, `[false, true]` or `[true, true]`, all with

Review Comment:
   The core of my confusion is that boolean `Intervals` NEVER have open bounds -- and if you try to make one with open bounds, `Interval::new` will remap them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367792837


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   Thank you @berkaysynnada  -- this was my understanding as well. So given that this test passes with the current implementation, what do you suggest we do? 
   
   1.  I wait for your refactoring PR
   2. Fix the code to not create invalid intervals? 
   3. Something else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed pull request #7885: Minor: Clarify Boolean `Interval` handling and verify it with a test
URL: https://github.com/apache/arrow-datafusion/pull/7885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#issuecomment-1819786925

   Superseded  by https://github.com/apache/arrow-datafusion/pull/8276


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367273147


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -194,6 +194,23 @@ impl Display for IntervalBound {
 /// called *unbounded* endpoint and represented using a `NULL` and written using
 /// `∞`.
 ///
+/// # Boolean Handling
+///
+/// Boolean values require special handling. Boolean Intervals NEVER have open
+/// bounds. If you try to and construct one one with open bounds,
+/// [`Interval::new`] will remap them to one of the three valid values.
+///
+/// Given there are only two  boolean values, and they are ordered such that

Review Comment:
   ```suggestion
   /// Given there are only two boolean values, and they are ordered such that
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367732159


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -235,18 +247,16 @@ impl Display for Interval {
 impl Interval {
     /// Creates a new interval object using the given bounds.
     ///
-    /// # Boolean intervals need special handling
+    /// As explained in [`Interval]` boolean `Interval`s are special and this

Review Comment:
   In fact, that's exactly what I'm working on. I have planned to set the fields to private and create intervals with only `try_new`. If an interval can be created with the given parameters, it creates that interval. If the parameters cannot construct a valid interval (like lower: `[true` -  upper: `false]` for a boolean interval), it returns an error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1368193320


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   I will be clarifying these issues and introducing a solid API in my PR. Is it possible to keep you waiting for a while, because they will all cause conflict for me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367735130


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   A lower bound cannot be a true - open, and vice versa an upper bound cannot be a false - open. A boolean interval representing the entire range only can be [false, true].



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367294835


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1089,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![

Review Comment:
   I think they look good according to the rules added above. But unbounded rules seem not tested?
   
   Also an open `false` upper bound and an open `true` lower bound are not in the rules. From the test cases, looks like they are not mapped?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367457933


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1089,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![

Review Comment:
   > But unbounded rules seem not tested?
   
   This is a good point, I will add tests for them



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   In fact when I look at this test now it doesn't make sense as `(true, false)` isn't a valid `Interval` according to my understanding -- the interval should be `(false, true)` to represent the entire range 🤔 



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   In fact when I look at this test now it doesn't make sense as `(true, false)` isn't a valid `Interval` according to my understanding -- the interval should be `(false, true)` to represent the entire range 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367734649


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range

Review Comment:
   The second output (expected_open_closed) of this TestCase is (false, false] is the same as [true, false] which is an invalid interval. As I mentioned [there](https://github.com/apache/arrow-datafusion/pull/7885/files#r1367732159), I think these cases should give an error. Maybe we can wait until I propose the PR which I am working on and planning to submit in a few days. It will explain these initialization issues neatly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1369176128


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   I'll handle any conflicts after your new PR lands -- no worries!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367792837


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   Thank you @berkaysynnada  -- this was my understanding as well. So given that this test passes with the current implementation, that suggestes to me it is a bugwhat do you suggest we do? 
   
   1.  I wait for your refactoring PR
   2. Fix the code to not create invalid intervals? 
   3. Something else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#issuecomment-1775177952

   Marking as a draft until @berkaysynnada 's refactor PR has landed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367456947


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range

Review Comment:
   I believe this is what the tests show, and I did not change the behavior.  However,  I don't know if this is correct or not. Perhaps @berkaysynnada  / @metesynnada  have some insight



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367470584


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,116 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {

Review Comment:
   Note I have subsequently found https://github.com/apache/arrow-datafusion/blob/1f4acbb4f0b7fde10b12684c7270069b86c386dc/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L1721-L1762 which appears to test the same things, but maybe not exhaustively 🤔 



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,116 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {

Review Comment:
   Note I have subsequently found https://github.com/apache/arrow-datafusion/blob/1f4acbb4f0b7fde10b12684c7270069b86c386dc/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L1721-L1762 which appears to test the same things, but maybe not exhaustively 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367734649


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range

Review Comment:
   The second output (expected_open_closed)) of this TestCase is (false, false] is the same as [true, false] which is an invalid interval. As I mentioned [there](https://github.com/apache/arrow-datafusion/pull/7885/files#r1367732159), I think these cases should give an error. Maybe we can wait until I propose the PR which I am working on and planning to submit in a few days. It will explain these initialization issues neatly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367276428


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -235,18 +252,16 @@ impl Display for Interval {
 impl Interval {
     /// Creates a new interval object using the given bounds.
     ///
-    /// # Boolean intervals need special handling
+    /// As explained in [`Interval]` boolean `Interval`s are special and this
+    /// function ensures they are one of the three valid values:
     ///
-    /// For boolean intervals, having an open false lower bound is equivalent to
-    /// having a true closed lower bound. Similarly, open true upper bound is
-    /// equivalent to having a false closed upper bound. Also for boolean
-    /// intervals, having an unbounded left endpoint is equivalent to having a
-    /// false closed lower bound, while having an unbounded right endpoint is
-    /// equivalent to having a true closed upper bound. Therefore; input
-    /// parameters to construct an Interval can have different types, but they
-    /// all result in `[false, false]`, `[false, true]` or `[true, true]`.
+    /// 1. An open `false` lower bound is mapped to a `true` closed lower bound.
+    /// 2. an open `true` upper bound is mapped to  `false` closed

Review Comment:
   ```suggestion
       /// 2. An open `true` upper bound is mapped to  `false` closed
   ```



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -235,18 +252,16 @@ impl Display for Interval {
 impl Interval {
     /// Creates a new interval object using the given bounds.
     ///
-    /// # Boolean intervals need special handling
+    /// As explained in [`Interval]` boolean `Interval`s are special and this
+    /// function ensures they are one of the three valid values:
     ///
-    /// For boolean intervals, having an open false lower bound is equivalent to
-    /// having a true closed lower bound. Similarly, open true upper bound is
-    /// equivalent to having a false closed upper bound. Also for boolean
-    /// intervals, having an unbounded left endpoint is equivalent to having a
-    /// false closed lower bound, while having an unbounded right endpoint is
-    /// equivalent to having a true closed upper bound. Therefore; input
-    /// parameters to construct an Interval can have different types, but they
-    /// all result in `[false, false]`, `[false, true]` or `[true, true]`.
+    /// 1. An open `false` lower bound is mapped to a `true` closed lower bound.
+    /// 2. an open `true` upper bound is mapped to  `false` closed
+    /// upper bound.
+    /// 3. An unbounded lower endpoints is mapped to a `false` closed lower
+    /// bound
+    /// 4. An unbounded lower endpoint is mapped to a `true` closed upper bound.

Review Comment:
   ```suggestion
       /// 4. An unbounded upper endpoint is mapped to a `true` closed upper bound.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367272235


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -194,6 +194,23 @@ impl Display for IntervalBound {
 /// called *unbounded* endpoint and represented using a `NULL` and written using
 /// `∞`.
 ///
+/// # Boolean Handling
+///
+/// Boolean values require special handling. Boolean Intervals NEVER have open
+/// bounds. If you try to and construct one one with open bounds,

Review Comment:
   ```suggestion
   /// bounds. If you try to and construct one with open bounds,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367278071


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (true, false),
+            },
+        ];
+
+        // Asserts that the given interval has the expected (closed) bounds
+        fn assert_bool_interval(
+            name: &str,
+            interval: Interval,
+            expected_lower: bool,
+            expected_upper: bool,
+        ) {
+            let open = false;
+            let expected_interval = Interval {
+                // boolean bounds are always closed
+                lower: IntervalBound {
+                    value: ScalarValue::Boolean(Some(expected_lower)),
+                    open,
+                },
+                upper: IntervalBound {
+                    value: ScalarValue::Boolean(Some(expected_upper)),
+                    open,
+                },
+            };
+            assert_eq!(
+                interval, expected_interval,
+                "{name} failure\n\nActual: {interval:#?}\n\nExpected: {expected_interval:#?}"
+            );
+        }
+
+        for case in cases {
+            println!("Case: {case:?}");

Review Comment:
   ```suggestion
   ```



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: false,
+                upper: true,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, true),
+                expected_closed_open: (false, false),
+            },
+            TestCase {
+                lower: true,
+                upper: false,
+                expected_open_open: (true, false), // whole range
+                expected_open_closed: (true, false),
+                expected_closed_open: (true, false),
+            },
+            TestCase {
+                lower: true,
+                upper: true,
+                expected_open_open: (true, false), // whole range

Review Comment:
   An open `true` upper bound is mapped according to rule 2.
   
   An open `true` lower bound is not mapped, right?



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open

Review Comment:
   ```suggestion
               /// expected bounds when lower is closed, upper is open
   ```



##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -1079,6 +1094,117 @@ mod tests {
         Interval::make(lower, upper, (false, false))
     }
 
+    #[test]
+    fn boolean_interval_test() -> Result<()> {
+        // Demonstrate / document how boolean Intervals work
+        #[derive(Debug)]
+        struct TestCase {
+            lower: bool,
+            upper: bool,
+            /// expected bounds when lower is open, upper is open
+            expected_open_open: (bool, bool),
+            /// expected bounds when lower is open, upper is closed
+            expected_open_closed: (bool, bool),
+            /// expected bounds when lower is closes, upper is open
+            expected_closed_open: (bool, bool),
+            // Not: closed/closed is the same as lower/upper
+        }
+
+        let cases = vec![
+            TestCase {
+                lower: false,
+                upper: false,
+                expected_open_open: (true, false), // whole range

Review Comment:
   An open `false` lower bound is mapped according to rule 1.
   
   An open `false` upper bound is not mapped, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #7885:
URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367138171


##########
datafusion/physical-expr/src/intervals/interval_aritmetic.rs:
##########
@@ -235,18 +247,16 @@ impl Display for Interval {
 impl Interval {
     /// Creates a new interval object using the given bounds.
     ///
-    /// # Boolean intervals need special handling
+    /// As explained in [`Interval]` boolean `Interval`s are special and this

Review Comment:
   Given that there are only three valid boolean intervals, and `Interval::new()` normalizes any provided into into one of those, it might make sense to make the fields of `Interval` non `pub` so that it is not possible to construct an invalid Interval
   
   What do you think @metesynnada  / @berkaysynnada / @ozankabak ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org