You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/12/27 00:11:00 UTC
[jira] [Resolved] (MADLIB-1167) Summary - add more statistics
[ https://issues.apache.org/jira/browse/MADLIB-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan resolved MADLIB-1167.
-------------------------------------
Resolution: Fixed
> Summary - add more statistics
> -----------------------------
>
> Key: MADLIB-1167
> URL: https://issues.apache.org/jira/browse/MADLIB-1167
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Descriptive Statistics
> Reporter: Frank McQuillan
> Assignee: Jingyi Mei
> Fix For: v1.14
>
>
> In the summary function
> http://madlib.apache.org/docs/latest/group__grp__summary.html
> add additional statistics:
> 1) % positive values
> 2) % negative values
> 3) % zero values
> 4) confidence intervals (95% ?) on mean
> * does this make sense, since need to assume a distribution for the data which we probably cannot infer?
> 5) Also please check why min and max are being reported for non-numeric cols. Is this a bug?
> {code}
> madlib=# SELECT * FROM houses_summary where target_column='zipcode';
> -[ RECORD 1 ]--------+----------------
> group_by |
> group_by_value |
> target_column | zipcode
> column_number | 8
> data_type | text
> row_count | 15
> distinct_values | 2
> missing_values | 0
> blank_values | 0
> fraction_missing | 0
> fraction_blank | 0
> mean |
> variance |
> min | 6
> max | 6
> first_quartile |
> median |
> third_quartile |
> most_frequent_values | {94301y,84301x}
> mfv_frequencies | {10,5}
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)