You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2019/11/23 01:11:11 UTC
[GitHub] [madlib] fmcquillan99 commented on issue #459: DL: Add support for asymmetric segment distribution to preprocessor

fmcquillan99 commented on issue #459: DL: Add support for asymmetric segment distribution to preprocessor
URL: https://github.com/apache/madlib/pull/459#issuecomment-557749546
 
 
   Please review tests (0) and (5) which I think may need some changes.
   
   
   (0)
   I think `gpu_config` in the output table should be changed to `distribution_rules`
   to match the name of the input parameter.
   
   (1)
   CPUs only
   ```
   DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('image_data',         -- Source table
                                           'image_data_packed',  -- Output table
                                           'species',            -- Dependent variable
                                           'rgb',                -- Independent variable
                                           NULL,                 -- Buffer size
                                           255,                   -- Normalizing constant
                                           NULL,
                                           'all_segments'
                                           );
   
   SELECT * FROM image_data_packed_summary;
   
   -[ RECORD 1 ]-------+------------------
   source_table        | image_data
   output_table        | image_data_packed
   dependent_varname   | species
   independent_varname | rgb
   dependent_vartype   | text
   class_values        | {bird,cat,dog}
   buffer_size         | 26
   normalizing_const   | 255
   num_classes         | 3
   gpu_config          | all_segments
   ```
   OK
   
   
   (2)
   table `xxx` exists but has wrong format for distribution table
   ```
   DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('image_data',         -- Source table
                                           'image_data_packed',  -- Output table
                                           'species',            -- Dependent variable
                                           'rgb',                -- Independent variable
                                           NULL,                 -- Buffer size
                                           255,                   -- Normalizing constant
                                           NULL,
                                           'xxx'
                                           );
   
   ERROR:  plpy.Error: training_preprocessor_dl: segments_to_use table must contain dbib column (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "training_preprocessor_dl", line 24, in <module>
       training_preprocessor_obj.training_preprocessor_dl()
     PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl
     PL/Python function "training_preprocessor_dl", line 271, in input_preprocessor_dl
     PL/Python function "training_preprocessor_dl", line 96, in _assert
   PL/Python function "training_preprocessor_dl"
   ```
   OK
   
   
   (3)
   distribution table `yyy` does not exit
   ```
   DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('image_data',         -- Source table
                                           'image_data_packed',  -- Output table
                                           'species',            -- Dependent variable
                                           'rgb',                -- Independent variable
                                           NULL,                 -- Buffer size
                                           255,                   -- Normalizing constant
                                           NULL,
                                           'yyy'
                                           );
   
   ERROR:  plpy.Error: training_preprocessor_dl error: Input table 'yyy' does not exist. (plpython.c:5038)
   DETAIL:  segments_to_use table (yyy) doesn't exist.
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "training_preprocessor_dl", line 24, in <module>
       training_preprocessor_obj.training_preprocessor_dl()
     PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl
     PL/Python function "training_preprocessor_dl", line 269, in input_preprocessor_dl
     PL/Python function "training_preprocessor_dl", line 674, in input_tbl_valid
   PL/Python function "training_preprocessor_dl"
   ```
   OK
   
   
   (4)
   Ask for GPUs but no GPUs on cluster
   ```
   DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('image_data',         -- Source table
                                           'image_data_packed',  -- Output table
                                           'species',            -- Dependent variable
                                           'rgb',                -- Independent variable
                                           NULL,                 -- Buffer size
                                           255,                   -- Normalizing constant
                                           NULL,
                                           'gpu_segments'
                                           );
   
   ERROR:  plpy.Error: training_preprocessor_dl: No GPUs configured on hosts. (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "training_preprocessor_dl", line 24, in <module>
       training_preprocessor_obj.training_preprocessor_dl()
     PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl
     PL/Python function "training_preprocessor_dl", line 243, in input_preprocessor_dl
   PL/Python function "training_preprocessor_dl"
   
   ```
   OK
   
   
   (5)
   Valid distribution table
   
   ```
   DROP TABLE IF EXISTS segments_to_use;
   CREATE TABLE segments_to_use AS
     SELECT DISTINCT dbid, hostname FROM gp_segment_configuration
     WHERE role='p' AND content>=0;
   SELECT * FROM segments_to_use ORDER BY hostname, dbid;
   
    dbid |       hostname
   ------+-----------------------
       2 | pm-demo-machine-keras
       3 | pm-demo-machine-keras
   (2 rows)
   
   DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
   
   SELECT madlib.training_preprocessor_dl('image_data',         -- Source table
                                           'image_data_packed',  -- Output table
                                           'species',            -- Dependent variable
                                           'rgb',                -- Independent variable
                                           NULL,                 -- Buffer size
                                           255,                   -- Normalizing constant
                                           NULL,
                                           'segments_to_use'
                                           );
   
   SELECT * FROM image_data_packed_summary;
   
   -[ RECORD 1 ]-------+------------------
   source_table        | image_data
   output_table        | image_data_packed
   dependent_varname   | species
   independent_varname | rgb
   dependent_vartype   | text
   class_values        | {bird,cat,dog}
   buffer_size         | 26
   normalizing_const   | 255
   num_classes         | 3
   gpu_config          | {0,1}
   ```
   The field `gpu_config` says `{0,1}` which does not match `dbid` of {2,3}. What is the `{0,1}` from?
   I think we should report out `dbid` or else the user might get confused.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services