You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/12/01 15:15:00 UTC

[jira] [Created] (ARROW-10781) [Rust] [DataFusion] TableProvider should provide row count statistics

Andy Grove created ARROW-10781:
----------------------------------

             Summary: [Rust] [DataFusion] TableProvider should provide row count statistics
                 Key: ARROW-10781
                 URL: https://issues.apache.org/jira/browse/ARROW-10781
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Rust - DataFusion
            Reporter: Andy Grove


In order to start building a cost-based optimizer, we need some statistics about data sources. The most basic statistic would be number of rows.

I propose that we add a Statistics struct that initially just makes a total row count available but that we can later extend to support more advanced statistics.
{code:java}
struct Statistics {
  row_count: Option<usize>
} {code}
We can then add a method to TableProvider:
{code:java}
trait TableProvider {
  fn statistics() -> Option<Statistics>;
} {code}
Statistics should be optional because not all data sources can provide statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)