You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2019/05/17 18:32:58 UTC

[GitHub] [madlib] khannaekta opened a new pull request #392: DL: Improve performance for predict

khannaekta opened a new pull request #392: DL: Improve performance for predict
URL: https://github.com/apache/madlib/pull/392
 
 
   JIRA: MADLIB-1343
   
   Performance improvements
   1. Using SD to cache the model and set the weights only once for the
   first row for each segment. This also meant that we had to clear the SD
   for the last row for each segment.
   
   2. We replaced `PythonFunctionBodyOnly` with
   `PythonFunctionBodyOnlyNoSchema` in the internal keras predict sql.
   Using `PythonFunctionBodyOnly` made the query much slower because it
   added the overhead of executing the schema query for every row in the
   test table. We don't really need to know the schema name for the
   internal UDF so now we use `PythonFunctionBodyOnlyNoSchema` instead.
   
   Additionally:
   1. Replace the use of predict_classes and proba with predict since non
   sequential models do not support predict_classes.
   2. Modify the internal keras predict query to not join the test table
   and the model table because it caused weird inconsistencies with
   the segment id due to which SD was not getting set/cleared properly.
   3. Add try catch in the internal predict UDF so that we can clear out
   the SD in case of an error.
   4. Reorder arguments for fit and evaluate UDA

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services