You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2021/01/08 21:01:14 UTC

[GitHub] [madlib] reductionista commented on a change in pull request #526: DL: AutoML encapsulation

reductionista commented on a change in pull request #526:
URL: https://github.com/apache/madlib/pull/526#discussion_r554188957



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
##########
@@ -79,6 +82,7 @@ class KerasAutoML(object):
         self.model_id_list = sorted(list(set(model_id_list)))
         self.compile_params_grid = compile_params_grid
         self.fit_params_grid = fit_params_grid
+        self.dist_key_col = DISTRIBUTION_KEY_COLNAME

Review comment:
       Currently it's not required, but while working on the Model Hopper refactor I realized it would help a lot with warm start if we eventually do require it.
   
   As I was working on optimizing weight initialization, I realized if we could rely on model output tables always having a dist key, then that would speed things up and avoid unnecessary work.  Otherwise the first step has to be copying the table over to one which does have the dist key, which usually involves shuffling the weights around to different segments.  If there is no dist key, then we can't assume anything about how the weights are distributed so there is no way to optimize that part.   For all we (or gpdb) knows, all of the weights might be on the same segment with none on any other segments.
   
   All newly generated output tables will have the dist key in them (I should make that change to fit also, come to think of it), but because they won't for v1.17 I don't require it as an input for warm start yet... we still do the extra unnecessary shuffling each time for backwards compatibility.
   
   So there's nothing necessary about it right now, but the earlier we get this into the codebase the earlier we can drop compatibility for warm start on output tables missing a dist key.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org