You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2019/04/02 21:02:12 UTC

[GitHub] [madlib] njayaram2 commented on a change in pull request #361: Minibatch Preprocessor DL: Add optional num_classes param.

njayaram2 commented on a change in pull request #361: Minibatch Preprocessor DL: Add optional num_classes param.
URL: https://github.com/apache/madlib/pull/361#discussion_r271494568
 
 

 ##########
 File path: src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
 ##########
 @@ -363,21 +365,70 @@ class MiniBatchPreProcessorDL(MiniBatchPreProcessor):
 
         self._validate_args()
         self.num_of_buffers = self._get_num_buffers()
-        self.to_one_hot_encode = True
 
 Review comment:
   Our 1-hot encoding follows the standard one-hot encoding convention. In fact, it is different from `keras.to_categorical`. For example, if there are 3 distinct class values captured in a list `y=[10, 11, 12]`, then the 1-hot encoded vector created by`keras.to_categorical(y)` is of size 13 (largest class value + 1). If it is called with `keras.to_categorical(y, num_classes=4)`, it errors out.
   The 1-hot encoding done in MADlib would create a 1-hot encoded vector of size 4 in both cases.
   
   I would say keras' 1-hot encoding is actually not the standard way of doing it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services