You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2020/07/30 07:27:28 UTC

[GitHub] [madlib] fmcquillan99 commented on pull request #506: DL: Add grid/random search for model selection with `generate_model_selection_configs`

fmcquillan99 commented on pull request #506:
URL: https://github.com/apache/madlib/pull/506#issuecomment-665993038


   errors and issues
   
   (1)
   ```
   SELECT madlib.generate_model_selection_configs(
                                           'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1,2],              -- model ids from model architecture table
                                            $$
                                            { 
                                             'lr': [1.0, 2.0, 'linear']
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8],
                                              'epochs': [1] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 
                                            5, -- num_configs (number of sampled parameters. Default=10) [to limit testing] 
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            );]
   ```
   produces
   ```
   InternalError: (psycopg2.errors.InternalError_) TypeError: cannot concatenate 'str' and 'float' objects (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "generate_model_selection_configs", line 21, in <module>
       mst_loader = madlib_keras_model_selection.MstSearch(**globals())
     PL/Python function "generate_model_selection_configs", line 42, in wrapper
     PL/Python function "generate_model_selection_configs", line 287, in __init__
     PL/Python function "generate_model_selection_configs", line 426, in find_random_combinations
     PL/Python function "generate_model_selection_configs", line 490, in generate_row_string
   PL/Python function "generate_model_selection_configs"
   
   [SQL: SELECT madlib.generate_model_selection_configs(
                                           'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1,2],              -- model ids from model architecture table
                                            $$
                                            { 'loss': ['categorical_crossentropy'],
                                             'lr': [0.0001, 0.1, 'linear']
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8],
                                              'epochs': [1] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 
                                            5, -- num_configs (number of sampled parameters. Default=10) [to limit testing] 
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            );]
   (Background on this error at: http://sqlalche.me/e/2j85)
   ```
   
   Likewise
   ```
   DROP TABLE IF EXISTS mst_table, mst_table_summary;
   
   SELECT madlib.generate_model_selection_configs(
                                           'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1,2],              -- model ids from model architecture table
                                            $$
                                            { 
                                             'lr': [1.0, 2.0, 'log'],
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8],
                                              'epochs': [1] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 
                                            1, -- num_configs (number of sampled parameters. Default=10) [to limit testing] 
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            ); 
                                            
   SELECT * FROM mst_table ORDER BY mst_key;
   ```
   produces
   ```
   InternalError: (psycopg2.errors.InternalError_) TypeError: cannot concatenate 'str' and 'numpy.float64' objects (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "generate_model_selection_configs", line 21, in <module>
       mst_loader = madlib_keras_model_selection.MstSearch(**globals())
     PL/Python function "generate_model_selection_configs", line 42, in wrapper
     PL/Python function "generate_model_selection_configs", line 287, in __init__
     PL/Python function "generate_model_selection_configs", line 426, in find_random_combinations
     PL/Python function "generate_model_selection_configs", line 490, in generate_row_string
   PL/Python function "generate_model_selection_configs"
   
   [SQL: SELECT madlib.generate_model_selection_configs(
                                           'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1,2],              -- model ids from model architecture table
                                            $$
                                            { 
                                             'lr': [1.0, 2.0, 'log'],
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8],
                                              'epochs': [1] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 
                                            1, -- num_configs (number of sampled parameters. Default=10) [to limit testing] 
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            );]
   (Background on this error at: http://sqlalche.me/e/2j85)
   ```
   
   (2)
   For search_type = 'grid' or 'random', use should be able to enter part of the string, e.g., 'rand' for random or 'g' for for grid.  There is a MADlib function that supports this.
   
   
   (3)
   change the name of the function from `generate_model_selection_configs`
   to `generate_model_configs`
   
   
   (4)
   remove exclamations ! from error messages and random capitalization. Suggested messages:
   
   "DL: 'num_configs' and 'random_state' must be NULL for grid search"
   
   "DL: Cannot search from a distribution with grid search"
   
   "DL: 'num_configs' cannot be NULL for random search"
   
   "DL: 'search_type' must be either 'grid' or 'random'"
   
   "DL: Please choose a valid distribution type ('linear' or 'log')"
   
   "DL: {0} should be of the format [lower_bound, upper_bound, distribution_type]"
   
   
   (5)
   In addition to `linear` sampling and `log` sampling we should add another type
   called `log_near_one`
   
   config_dict[cp] = 1.0 - np.power(
   								 10, 
   								 np.random.uniform(
   								 					np.log10(1.0 - param_values[1]),
                                    					np.log10(1.0 - param_values[0])
                                                     )
                                   )
   
   This type of sampling is useful for exponentially weighted average type params like momentum, which are very sensitive to changes near 1.  It has the effect of producing more values near 1 than regular log sampling.
   
   e.g.
   momentum values in range [0.9000, 0.9005] average the prev 10 values no matter where you are in the range (no diff)
   but
   momentum values in range [0.9990, 0.9995] average the prev 1000 values for the left side and prev 2000 values for the right side (big diff), so you want to generate more samples nearer to the right side to get better coverage.
   
   
   (6)
   ```
   DROP TABLE IF EXISTS mst_table, mst_table_summary;
   
   SELECT madlib.generate_model_selection_configs(
                                           'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1],              -- model ids from model architecture table
                                            $$
                                            { 'loss': ['categorical_crossentropy'],
                                             'optimizer': ['Adam'],
                                             'lr': [0.9, 0.95, 'log'],
                                             'metrics': ['accuracy']
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096],
                                              'epochs': [1, 2, 3, 5, 10, 12] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type
                                            5, -- num_configs
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            ); 
                                            
   SELECT * FROM mst_table ORDER BY mst_key;
   ```
   followed by 
   ```
   SELECT madlib.generate_model_selection_configs(
                                           'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1],              -- model ids from model architecture table
                                            $$
                                            { 'loss': ['categorical_crossentropy'],
                                             'optimizer': ['SGD'],
                                             'metrics': ['accuracy']
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096],
                                              'epochs': [1, 2, 3, 5, 10, 12] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type
                                            5, -- num_configs
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            ); 
                                            
   SELECT * FROM mst_table ORDER BY mst_key;
   ```
   produces
   ```
   IntegrityError: (psycopg2.errors.UniqueViolation) plpy.SPIError: duplicate key value violates unique constraint "mst_table_model_id_key"  (seg0 10.128.0.41:40000 pid=22297)
   DETAIL:  Key (model_id, compile_params, fit_params)=(1, optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy', epochs=12,batch_size=32) already exists.
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "generate_model_selection_configs", line 22, in <module>
       mst_loader.load()
     PL/Python function "generate_model_selection_configs", line 313, in load
     PL/Python function "generate_model_selection_configs", line 566, in insert_into_mst_table
   PL/Python function "generate_model_selection_configs"
   
   [SQL: SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table
                                           'mst_table',          -- model selection table output
                                            ARRAY[1],              -- model ids from model architecture table
                                            $$
                                            { 'loss': ['categorical_crossentropy'],
                                             'optimizer': ['SGD'],
                                             'metrics': ['accuracy']
                                            } 
                                            $$, -- compile_param_grid 
                                            $$ 
                                            { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096],
                                              'epochs': [1, 2, 3, 5, 10, 12] 
                                            } 
                                            $$, -- fit_param_grid 
                                            
                                            'random', -- search_type
                                            5, -- num_configs
                                            NULL, -- random_state 
                                            NULL -- object table (Default=None)  
                                            );]
   (Background on this error at: http://sqlalche.me/e/gkpj)
   ```
   But it only produced the error every 2nd time I did this. i.e., 1-pass it would work then the 2nd pass it would throw the error.
   
   When it does pass, it produces
   ```
    mst_key | model_id |                                        compile_params                                        |        fit_params        
   ---------+----------+----------------------------------------------------------------------------------------------+--------------------------
          1 |        1 | optimizer='Adam(lr=0.9063214445649174)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=10,batch_size=256
          2 |        1 | optimizer='Adam(lr=0.9367722192055232)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=5,batch_size=256
          3 |        1 | optimizer='Adam(lr=0.9212048311857509)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=2,batch_size=32
          4 |        1 | optimizer='Adam(lr=0.9193149125403647)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=3,batch_size=256
          5 |        1 | optimizer='Adam(lr=0.9326284661833211)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=2,batch_size=256
          6 |        1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'                       | epochs=10,batch_size=256
          7 |        1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'                       | epochs=5,batch_size=8
          8 |        1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'                       | epochs=2,batch_size=1024
          9 |        1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'                       | epochs=3,batch_size=32
         10 |        1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'                       | epochs=12,batch_size=8
   (10 rows)
   ```
   is `optimizer='SGD()'...` correct or should it be `optimizer='SGD'...` ?
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org