You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/06/09 22:48:18 UTC

[jira] [Closed] (MADLIB-1117) Add "columns to process per pass" as an optional param for summary()

     [ https://issues.apache.org/jira/browse/MADLIB-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan closed MADLIB-1117.
-----------------------------------

> Add "columns to process per pass" as an optional param for summary()
> --------------------------------------------------------------------
>
>                 Key: MADLIB-1117
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1117
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Sketch-based Estimators
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>            Priority: Minor
>             Fix For: v1.12
>
>
> Context
> The summary() function
> http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html
> currently processes 15 columns per pass to keep memory usage below 1 GB limit.  This is a somewhat arbitrary limit since memory usage depends on many things including data set, and which params in summary() are set.  If more columns per pass could be used, summary() would run faster.
> Story
> As a MADlib developer, I want to add "columns to process per pass" as an optional param for summary() function.  Default: use 15 columns (which is the current setting).  Suggested param name:  "columns_per_pass" though if you have a better name, that's fine.
> Acceptance
> 1) Add new optional parameter and update docs.  Please add a note so it is clear what this control does.
> 2) Write and pass tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)