You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/03 01:14:00 UTC

[jira] [Commented] (MADLIB-1167) Summary - add more statistics

    [ https://issues.apache.org/jira/browse/MADLIB-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308960#comment-16308960 ] 

ASF GitHub Bot commented on MADLIB-1167:
----------------------------------------

GitHub user fmcquillan99 opened a pull request:

    https://github.com/apache/madlib/pull/222

    minor update to summary() user docs

    to finish off
    https://issues.apache.org/jira/browse/MADLIB-1167

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fmcquillan99/incubator-madlib summary-v1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #222
    
----
commit 15628d63bccd4b04789d8963ad1291531f312dc1
Author: Frank McQuillan <fm...@...>
Date:   2018-01-03T01:10:29Z

    minor update to summary() user docs

----


> Summary - add more statistics
> -----------------------------
>
>                 Key: MADLIB-1167
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1167
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Descriptive Statistics
>            Reporter: Frank McQuillan
>            Assignee: Jingyi Mei
>             Fix For: v1.14
>
>
> In the summary function
> http://madlib.apache.org/docs/latest/group__grp__summary.html
> add additional statistics:
> 1) % positive values
> 2) % negative values
> 3) % zero values
> 4) confidence intervals (95% ?) on mean
> * does this make sense, since need to assume a distribution for the data which we probably cannot infer?
> 5) Also please check why min and max are being reported for non-numeric cols.  Is this a bug?
> {code}
> madlib=# SELECT * FROM houses_summary where target_column='zipcode';
> -[ RECORD 1 ]--------+----------------
> group_by             | 
> group_by_value       | 
> target_column        | zipcode
> column_number        | 8
> data_type            | text
> row_count            | 15
> distinct_values      | 2
> missing_values       | 0
> blank_values         | 0
> fraction_missing     | 0
> fraction_blank       | 0
> mean                 | 
> variance             | 
> min                  | 6
> max                  | 6
> first_quartile       | 
> median               | 
> third_quartile       | 
> most_frequent_values | {94301y,84301x}
> mfv_frequencies      | {10,5}
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)