You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2020/03/05 01:24:00 UTC

[jira] [Commented] (MADLIB-1413) Last optional param in summary errors when NULL

    [ https://issues.apache.org/jira/browse/MADLIB-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051724#comment-17051724 ] 

Frank McQuillan commented on MADLIB-1413:
-----------------------------------------

{code}
DROP TABLE IF EXISTS abalone_summary ;
SELECT
madlib.summary ( 
    'abalone_encoded',  -- source_table,
    'abalone_summary',  -- output_table,
    NULL,  -- target_cols,
    NULL  -- grouping_cols
);
{code}
produces
{code}
group_by             | 
group_by_value       | 
target_column        | age
column_number        | 10
data_type            | numeric
row_count            | 4177
distinct_values      | 28
missing_values       | 0
blank_values         | 
fraction_missing     | 0
fraction_blank       | 
positive_values      | 4177
negative_values      | 0
zero_values          | 0
mean                 | 11.4336844625329
variance             | 10.3952659473471
confidence_interval  | {11.3359063530521,11.5314625720137}
min                  | 2.5
max                  | 30.5
first_quartile       | 9.5
median               | 10.5
third_quartile       | 12.5
most_frequent_values | {10.5,11.5,9.5,12.5,8.5,13.5,7.5,14.5,15.5,6.5}
mfv_frequencies      | {689,634,568,487,391,267,259,203,126,115}
{code}
LGTM

> Last optional param in summary errors when NULL
> -----------------------------------------------
>
>                 Key: MADLIB-1413
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1413
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Descriptive Statistics
>            Reporter: Frank McQuillan
>            Assignee: Orhan Kislal
>            Priority: Minor
>             Fix For: v1.17
>
>
> {code}
> DROP TABLE IF EXISTS abalone_summary_exact;
> SELECT madlib.summary ( 
>     'abalone_encoded',  -- source_table
>     'abalone_summary_exact',  -- output_table
>     NULL,  -- target_cols
>     NULL,  -- grouping_cols
>     TRUE,  -- get_distinct
>     TRUE,  -- get_quartiles
>     NULL,  -- quantile_array
>     10,    -- how_many_mfv
>     FALSE, -- get_estimate
>     NULL   -- n_cols_per_run
>         );
> {code}
> produces this error
> {code}
> ERROR:  plpy.Error: Summary - Invalid parameter: Number of columns per run should be positive (plpython.c:5038)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "summary", line 24, in <module>
>     get_estimates, n_cols_per_run)
>   PL/Python function "summary", line 67, in summary
>   PL/Python function "summary", line 388, in run
>   PL/Python function "summary", line 105, in _validate_params
>   PL/Python function "summary", line 117, in _assert
> PL/Python function "summary"
> {code}
> which seems wrong since the last param is optional.
> The following does work:
> {code}
> DROP TABLE IF EXISTS abalone_summary_exact;
> SELECT madlib.summary ( 
>     'abalone_encoded',  -- source_table
>     'abalone_summary_exact',  -- output_table
>     NULL,  -- target_cols
>     NULL,  -- grouping_cols
>     TRUE,  -- get_distinct
>     TRUE,  -- get_quartiles
>     NULL,  -- quantile_array
>     10,    -- how_many_mfv
>     FALSE, -- get_estimate
>     15   -- n_cols_per_run
>         );
> {code}
> and so does this:
> {code}
> DROP TABLE IF EXISTS abalone_summary_exact;
> SELECT madlib.summary ( 
>     'abalone_encoded',  -- source_table
>     'abalone_summary_exact',  -- output_table
>     NULL,  -- target_cols
>     NULL,  -- grouping_cols
>     TRUE,  -- get_distinct
>     TRUE,  -- get_quartiles
>     NULL,  -- quantile_array
>     10,    -- how_many_mfv
>     FALSE -- get_estimate
>     );
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)