You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2020/09/29 19:57:59 UTC
[GitHub] [madlib] Advitya17 commented on pull request #518: DL: [AutoML] Add support for Hyperopt on top of MOP for MPP + AutoML best-so-far

Advitya17 commented on pull request #518:
URL: https://github.com/apache/madlib/pull/518#issuecomment-700952240


   > (6)
   > fmin definition
   > https://github.com/hyperopt/hyperopt/wiki/FMin
   > fmin(loss, space, algo, max_evals)
   > 
   > Looks like this PR is setting max_evals = num_models/num_segments in `get_configs_list()`. For one thing I'm not sure that
   > 
   > ```
   > self.num_workers = get_seg_number() * get_segments_per_host()
   > ```
   > 
   > gives total number of workers? On a 1 host, 2 segments-per-host database this returned 4 instead of the expected 2. Also this needs to be consistent with the distribution rules set in the mini-batch preprocessor.
   > 
   > (7)
   > The function `get_configs_list()` may not be distributing the workload correctly to the segments.
   > For
   > 
   > ```
   >     num_models = 3
   >     num_workers = 3
   > ```
   > 
   > it returns `[(1, 3)]` which seems OK. But for
   > 
   > ```
   >     num_models = 5
   >     num_workers = 3
   > ```
   > 
   > it returns `[(1, 5)]` which does not seem OK. It will not do any hyperopt updates if all 5 configs are grouped together. I would have expected `[(1, 3), (4, 5)]`. For
   > 
   > ```
   >     num_models = 20
   >     num_workers = 3
   > ```
   > 
   > it returns `[(1, 4), (5, 8), (9, 11), (12, 14), (15, 17), (18, 20)]` which means it is running 4 configs on 3 segments multiple times which does not seem efficient. I would have expected something like `[(1, 3), (4, 6), (7, 9), (10, 12), (13, 15), (16, 18),(19, 20)]` which runs 3 configs at a time on the 3 segments.
   > 
   > (8)
   > In `find_hyperopt_config()` confirm that the loss for _each_ model that passed to hyperopt after training, and not just the best one from the group.
   > 
   > (9)
   > defaults
   > Seems like hyperband defaults are being used for hyperopt in the case that use does not specify hyperband is not specified. That will probably throw an error
   
   (7) 
   
   (a) As per my discussions with Domino, the current implementation seems to run more than 1 rounds when `number of models >= 2 * number of segments/workers in the cluster`. Sure, though the latter `[(1, 3), (4, 5)]` may have an idle worker, it may be a better approach for Hyperopt.
   
   (b) In reference to the 2nd example, I believe this was as per our discussions to minimize number of model training function calls. Observe that the former schedule runs 4 configs on 3 workers for 2 rounds/runs and continues with the 3 configs on 3 workers for all other rounds (as 3 doesn't perfectly divide 20). The latter only runs 2 configs on 3 workers at the end which also thus has an idle worker (apart from 1 more model training function call compared to the former approach).
   
   (8) I believe all the losses are already passed on to hyperopt as part of my implementation, in the `Hyperopt TPE Update` for loop.
   
   (9) Yes I did observe that. I tried to change the global defaults in the SQL file as per chosen automl method, but ran into errors in the process. As per my current implementation, the hyperband params are defaults and the user may need to explicitly specify hyperopt params when using hyperopt.
   
   (10) The name changes sound good to me.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org