You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/05/17 09:08:00 UTC

[jira] [Updated] (FLINK-12671) Summarizer: summary statistics for Table

     [ https://issues.apache.org/jira/browse/FLINK-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated FLINK-12671:
-----------------------------------
    Labels: auto-unassigned pull-request-available  (was: auto-unassigned)

> Summarizer: summary statistics for Table
> ----------------------------------------
>
>                 Key: FLINK-12671
>                 URL: https://issues.apache.org/jira/browse/FLINK-12671
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Library / Machine Learning
>            Reporter: Xu Yang
>            Priority: Major
>              Labels: auto-unassigned, pull-request-available
>
> We provide summary statistics for Table through Summarizer. User can easily get the total count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1, normL2, the number of missing values and the number of valid values.
> SparkML has same function, [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]
>  
> {code:java|title=Example|borderStyle=solid}
>         String[] colNames = new String[]{"id", "height", "weight"};
>         Row[] data = new Row[]{
>             Row.of(1, 168, 48.1),
>             Row.of(2, 165, 45.8),
>             Row.of(3, 160, 45.3),
>             Row.of(4, 163, 41.9),
>             Row.of(5, 149, 40.5),
>         };
>         Table input = MLSession.createBatchTable(data, colNames);
>         TableSummary summary = new Summarizer(input).collectResult();
>         System.out.println(summary.mean("height"));
>         System.out.println(summary);
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)