You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2024/03/20 17:30:43 UTC

[PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTICT [arrow-datafusion]

alamb opened a new pull request, #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712

   
   
   ## Which issue does this PR close?
   
   Follow on to https://github.com/apache/arrow-datafusion/pull/9679
   
   ## Rationale for this change
   @Omega359  pointed out that the use of `expect` would panic on unexpected results: https://github.com/apache/arrow-datafusion/pull/9679#discussion_r1531304375
   
   ## What changes are included in this PR?
   Change a panic into an internal error
   
   ## Are these changes tested?
   No (it is "impossible" to hit this code)
   
   ## Are there any user-facing changes?
   
   Better UX on unexpected error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712#discussion_r1534359976


##########
datafusion/physical-expr/src/aggregate/count_distinct/mod.rs:
##########
@@ -268,8 +268,11 @@ impl Accumulator for DistinctCountAccumulator {
         let array = &states[0];
         let list_array = array.as_list::<i32>();
         for inner_array in list_array.iter() {
-            let inner_array = inner_array
-                .expect("counts are always non null, so are intermediate results");
+            let Some(inner_array) = inner_array else {
+                return internal_err!(
+                    "intermediate results of count distinct should always be non0null"

Review Comment:
   Thank you 🤦 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on code in PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712#discussion_r1532880670


##########
datafusion/physical-expr/src/aggregate/count_distinct/mod.rs:
##########
@@ -268,8 +268,11 @@ impl Accumulator for DistinctCountAccumulator {
         let array = &states[0];
         let list_array = array.as_list::<i32>();
         for inner_array in list_array.iter() {
-            let inner_array = inner_array
-                .expect("counts are always non null, so are intermediate results");
+            let Some(inner_array) = inner_array else {
+                return internal_err!(
+                    "intermediate results of count distinct should always be non0null"

Review Comment:
   ```suggestion
                       "intermediate results of count distinct should always be non null"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712#discussion_r1534362354


##########
datafusion/physical-expr/src/aggregate/count_distinct/mod.rs:
##########
@@ -268,8 +268,11 @@ impl Accumulator for DistinctCountAccumulator {
         let array = &states[0];
         let list_array = array.as_list::<i32>();
         for inner_array in list_array.iter() {

Review Comment:
   I think filtering out nulls would potentially silently ignore errors
   
   I think the rationale goes like
   1. This code is used to combine multiple partial results (the results of calling `Self::state()`)
   2. Since `Self::state` never returns `None` -- it always returns `Ok(vec![ScalarValue::List(arr)])` then this merge code should never receive a `None`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on code in PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712#discussion_r1534378061


##########
datafusion/physical-expr/src/aggregate/count_distinct/mod.rs:
##########
@@ -268,8 +268,11 @@ impl Accumulator for DistinctCountAccumulator {
         let array = &states[0];
         let list_array = array.as_list::<i32>();
         for inner_array in list_array.iter() {
-            let inner_array = inner_array
-                .expect("counts are always non null, so are intermediate results");
+            let Some(inner_array) = inner_array else {
+                return internal_err!(
+                    "intermediate results of count distinct should always be non null"

Review Comment:
   ```suggestion
                       "Intermediate results of COUNT DISTINCT should always be non null"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712#issuecomment-2013373607

   Thanks for the review @comphead 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Minor: return internal error rather than panic on unexpected error in COUNT DISTINCT [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on code in PR #9712:
URL: https://github.com/apache/arrow-datafusion/pull/9712#discussion_r1532881478


##########
datafusion/physical-expr/src/aggregate/count_distinct/mod.rs:
##########
@@ -268,8 +268,11 @@ impl Accumulator for DistinctCountAccumulator {
         let array = &states[0];
         let list_array = array.as_list::<i32>();
         for inner_array in list_array.iter() {

Review Comment:
   should we just filter out Nones? just thinking



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org