You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2019/05/17 18:32:58 UTC
[GitHub] [madlib] khannaekta opened a new pull request #392: DL: Improve
performance for predict
khannaekta opened a new pull request #392: DL: Improve performance for predict
URL: https://github.com/apache/madlib/pull/392
JIRA: MADLIB-1343
Performance improvements
1. Using SD to cache the model and set the weights only once for the
first row for each segment. This also meant that we had to clear the SD
for the last row for each segment.
2. We replaced `PythonFunctionBodyOnly` with
`PythonFunctionBodyOnlyNoSchema` in the internal keras predict sql.
Using `PythonFunctionBodyOnly` made the query much slower because it
added the overhead of executing the schema query for every row in the
test table. We don't really need to know the schema name for the
internal UDF so now we use `PythonFunctionBodyOnlyNoSchema` instead.
Additionally:
1. Replace the use of predict_classes and proba with predict since non
sequential models do not support predict_classes.
2. Modify the internal keras predict query to not join the test table
and the model table because it caused weird inconsistencies with
the segment id due to which SD was not getting set/cleared properly.
3. Add try catch in the internal predict UDF so that we can clear out
the SD in case of an error.
4. Reorder arguments for fit and evaluate UDA
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services