You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/03/15 16:29:52 UTC

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5071: DataFrame count method

tustvold commented on code in PR #5071:
URL: https://github.com/apache/arrow-datafusion/pull/5071#discussion_r1137405371


##########
datafusion/core/src/dataframe.rs:
##########
@@ -361,6 +363,39 @@ impl DataFrame {
         Ok(DataFrame::new(self.session_state, plan))
     }
 
+    /// Run a count aggregate on the DataFrame and execute the DataFrame to collect this
+    /// count and return it as a usize, to find the total number of rows after executing
+    /// the DataFrame.
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let ctx = SessionContext::new();
+    /// let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+    /// let count = df.count().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    pub async fn count(self) -> Result<usize> {
+        let rows = self
+            .aggregate(
+                vec![],
+                vec![datafusion_expr::count(Expr::Literal(ScalarValue::Null))],

Review Comment:
   This was actually working by accident as a result of a quirk of NullArray whereby it doesn't have a null buffer despite all values being null. The count reported by this query should be 0 as only non-null values are counted.
   
   Fix in https://github.com/apache/arrow-datafusion/pull/5612/files#diff-932cfd7271917561280a69edabb35cfd109d22ae736f77e85adcf63455918121R630



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org