You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2020/03/05 01:24:00 UTC
[jira] [Commented] (MADLIB-1413) Last optional param in summary
errors when NULL
[ https://issues.apache.org/jira/browse/MADLIB-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051724#comment-17051724 ]
Frank McQuillan commented on MADLIB-1413:
-----------------------------------------
{code}
DROP TABLE IF EXISTS abalone_summary ;
SELECT
madlib.summary (
'abalone_encoded', -- source_table,
'abalone_summary', -- output_table,
NULL, -- target_cols,
NULL -- grouping_cols
);
{code}
produces
{code}
group_by |
group_by_value |
target_column | age
column_number | 10
data_type | numeric
row_count | 4177
distinct_values | 28
missing_values | 0
blank_values |
fraction_missing | 0
fraction_blank |
positive_values | 4177
negative_values | 0
zero_values | 0
mean | 11.4336844625329
variance | 10.3952659473471
confidence_interval | {11.3359063530521,11.5314625720137}
min | 2.5
max | 30.5
first_quartile | 9.5
median | 10.5
third_quartile | 12.5
most_frequent_values | {10.5,11.5,9.5,12.5,8.5,13.5,7.5,14.5,15.5,6.5}
mfv_frequencies | {689,634,568,487,391,267,259,203,126,115}
{code}
LGTM
> Last optional param in summary errors when NULL
> -----------------------------------------------
>
> Key: MADLIB-1413
> URL: https://issues.apache.org/jira/browse/MADLIB-1413
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Descriptive Statistics
> Reporter: Frank McQuillan
> Assignee: Orhan Kislal
> Priority: Minor
> Fix For: v1.17
>
>
> {code}
> DROP TABLE IF EXISTS abalone_summary_exact;
> SELECT madlib.summary (
> 'abalone_encoded', -- source_table
> 'abalone_summary_exact', -- output_table
> NULL, -- target_cols
> NULL, -- grouping_cols
> TRUE, -- get_distinct
> TRUE, -- get_quartiles
> NULL, -- quantile_array
> 10, -- how_many_mfv
> FALSE, -- get_estimate
> NULL -- n_cols_per_run
> );
> {code}
> produces this error
> {code}
> ERROR: plpy.Error: Summary - Invalid parameter: Number of columns per run should be positive (plpython.c:5038)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "summary", line 24, in <module>
> get_estimates, n_cols_per_run)
> PL/Python function "summary", line 67, in summary
> PL/Python function "summary", line 388, in run
> PL/Python function "summary", line 105, in _validate_params
> PL/Python function "summary", line 117, in _assert
> PL/Python function "summary"
> {code}
> which seems wrong since the last param is optional.
> The following does work:
> {code}
> DROP TABLE IF EXISTS abalone_summary_exact;
> SELECT madlib.summary (
> 'abalone_encoded', -- source_table
> 'abalone_summary_exact', -- output_table
> NULL, -- target_cols
> NULL, -- grouping_cols
> TRUE, -- get_distinct
> TRUE, -- get_quartiles
> NULL, -- quantile_array
> 10, -- how_many_mfv
> FALSE, -- get_estimate
> 15 -- n_cols_per_run
> );
> {code}
> and so does this:
> {code}
> DROP TABLE IF EXISTS abalone_summary_exact;
> SELECT madlib.summary (
> 'abalone_encoded', -- source_table
> 'abalone_summary_exact', -- output_table
> NULL, -- target_cols
> NULL, -- grouping_cols
> TRUE, -- get_distinct
> TRUE, -- get_quartiles
> NULL, -- quantile_array
> 10, -- how_many_mfv
> FALSE -- get_estimate
> );
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)