You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/12/01 15:15:00 UTC
[jira] [Created] (ARROW-10781) [Rust] [DataFusion] TableProvider
should provide row count statistics
Andy Grove created ARROW-10781:
----------------------------------
Summary: [Rust] [DataFusion] TableProvider should provide row count statistics
Key: ARROW-10781
URL: https://issues.apache.org/jira/browse/ARROW-10781
Project: Apache Arrow
Issue Type: New Feature
Components: Rust - DataFusion
Reporter: Andy Grove
In order to start building a cost-based optimizer, we need some statistics about data sources. The most basic statistic would be number of rows.
I propose that we add a Statistics struct that initially just makes a total row count available but that we can later extend to support more advanced statistics.
{code:java}
struct Statistics {
row_count: Option<usize>
} {code}
We can then add a method to TableProvider:
{code:java}
trait TableProvider {
fn statistics() -> Option<Statistics>;
} {code}
Statistics should be optional because not all data sources can provide statistics.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)