You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Ekta Khanna (Jira)" <ji...@apache.org> on 2019/11/11 23:01:00 UTC

[jira] [Created] (MADLIB-1392) DL: Preprocessor support for asymmetric segment distribution

Ekta Khanna created MADLIB-1392:
-----------------------------------

             Summary: DL: Preprocessor support for asymmetric segment distribution
                 Key: MADLIB-1392
                 URL: https://issues.apache.org/jira/browse/MADLIB-1392
             Project: Apache MADlib
          Issue Type: New Feature
          Components: Deep Learning
            Reporter: Ekta Khanna
             Fix For: v1.17


Add asymmetric segment redistribution support to the deep learning preprocessor. Applies to {{training_preprocessor_dl()}} and {{validation_preprocessor_dl()}}
{code:java}
training_preprocessor_dl(source_table,
                         output_table,
                         dependent_varname,
                         independent_varname,
                         buffer_size,
                         normalizing_const,
                         num_classes,
                         distribution_rules    -- new optional param
                        )
{code}
Following are the possible values for the new optional param({{distribution_rules}})
 # TEXT, *default*: {{all_segments}}. Specifies how to distribute the {{output_table}}. This is important for how the fit function will use resources on the cluster. The default {{all_segments}} means the {{output_table}} will be distributed to all segments in the database cluster.
 # If you specify {{gpu_segments}} then the {{output_table}} will be distributed to all segments that are on hosts that have GPUs attached. This will make maximum use of GPU resources.
 # You can also specify the name of a resources table containing the segments to use for training. This table is typically created and maintained by the database administrator. Must contain a column called {{dbid}} that specifies the segment id from the {{gp_segment_configuration}} table.
Sample {{segments_to_use}} table:
{code:java}
 dbid | notes
 -----|--------------
    2 | comment here
    3 | comment here
    4 | comment here
    5 | comment here
{code}

Same deal as above ^^^ for validation preprocessor.

This change adds a new column to the output summary table {{gpu_config}}, contains the following values:
# if {{distribution_policy}} = {{all_segments}}, then {{all_segments}}
# if {{distribution_policy}} = {{gpu_segments}}, then array of segments ids all segments that are on hosts that have GPUs attached
# if {{distribution_policy}} = {{segments_to_use_table}}, then array of segments ids, for the above sample {{segments_to_use}} table -> [2,3,4,5]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)