You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 01:14:00 UTC
[jira] [Commented] (MADLIB-1167) Summary - add more statistics
[ https://issues.apache.org/jira/browse/MADLIB-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308960#comment-16308960 ]
ASF GitHub Bot commented on MADLIB-1167:
----------------------------------------
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/222
minor update to summary() user docs
to finish off
https://issues.apache.org/jira/browse/MADLIB-1167
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fmcquillan99/incubator-madlib summary-v1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/222.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #222
----
commit 15628d63bccd4b04789d8963ad1291531f312dc1
Author: Frank McQuillan <fm...@...>
Date: 2018-01-03T01:10:29Z
minor update to summary() user docs
----
> Summary - add more statistics
> -----------------------------
>
> Key: MADLIB-1167
> URL: https://issues.apache.org/jira/browse/MADLIB-1167
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Descriptive Statistics
> Reporter: Frank McQuillan
> Assignee: Jingyi Mei
> Fix For: v1.14
>
>
> In the summary function
> http://madlib.apache.org/docs/latest/group__grp__summary.html
> add additional statistics:
> 1) % positive values
> 2) % negative values
> 3) % zero values
> 4) confidence intervals (95% ?) on mean
> * does this make sense, since need to assume a distribution for the data which we probably cannot infer?
> 5) Also please check why min and max are being reported for non-numeric cols. Is this a bug?
> {code}
> madlib=# SELECT * FROM houses_summary where target_column='zipcode';
> -[ RECORD 1 ]--------+----------------
> group_by |
> group_by_value |
> target_column | zipcode
> column_number | 8
> data_type | text
> row_count | 15
> distinct_values | 2
> missing_values | 0
> blank_values | 0
> fraction_missing | 0
> fraction_blank | 0
> mean |
> variance |
> min | 6
> max | 6
> first_quartile |
> median |
> third_quartile |
> most_frequent_values | {94301y,84301x}
> mfv_frequencies | {10,5}
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)