You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by fm...@apache.org on 2020/03/26 20:16:36 UTC
[madlib] branch master updated: add clarification in DL user docs re GPU memory release

This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
     new 2896c24  add clarification in DL user docs re GPU memory release
2896c24 is described below

commit 2896c24acba9f25cc30d1a412ee2d84cc4cf5187
Author: Frank McQuillan <fm...@pivotal.io>
AuthorDate: Thu Mar 26 13:14:41 2020 -0700

    add clarification in DL user docs re GPU memory release
---
 .../postgres/modules/deep_learning/madlib_keras.sql_in    | 15 ++++++++++++++-
 .../deep_learning/madlib_keras_fit_multiple_model.sql_in  | 15 ++++++++++++++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
index e4794a3..75fa56a 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras.sql_in
@@ -84,9 +84,20 @@ but rather imported from an external source.  This is in the section
 called "Predict BYOM" below, where "BYOM" stands for "Bring Your Own Model."
 
 Note that the following MADlib functions are targeting a specific Keras
-version (2.2.4) with a specific Tensorflow kernel version (1.14).
+version (2.2.4) with a specific TensorFlow kernel version (1.14).
 Using a newer or older version may or may not work as intended.
 
+@note CUDA GPU memory cannot be released until the process holding it is terminated. 
+When a MADlib deep learning function is called with GPUs, Greenplum internally 
+creates a process (called a slice) which calls TensorFlow to do the computation. 
+This process holds the GPU memory until one of the following two things happen:
+query finishes and user logs out of the Postgres client/session; or, 
+query finishes and user waits for the timeout set by `gp_vmem_idle_resource_timeout`.  
+The default value for this timeout is 18 sec [8].  So the recommendation is:
+log out/reconnect to the session after every GPU query; or
+wait for `gp_vmem_idle_resource_timeout` before you run another GPU query (you can 
+also set it to a lower value).
+
 @anchor keras_fit
 @par Fit
 The fit (training) function has the following format:
@@ -1620,6 +1631,8 @@ http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
 Yuhao Zhang, and Arun Kumar, Technical Report, Computer Science and Engineering, University of California,
 San Diego https://adalabucsd.github.io/papers/TR_2019_Cerebro.pdf.
 
+[8] Greenplum Database server configuration parameters https://gpdb.docs.pivotal.io/latest/ref_guide/config_params/guc-list.html
+
 @anchor related
 @par Related Topics
 
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
index cd58d93..b929724 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
@@ -94,6 +94,17 @@ release the disk space once the fit multiple query has completed execution.
 This is not the case for GPDB 6+ where disk space is released during the
 fit multiple query.
 
+@note CUDA GPU memory cannot be released until the process holding it is terminated. 
+When a MADlib deep learning function is called with GPUs, Greenplum internally 
+creates a process (called a slice) which calls TensorFlow to do the computation. 
+This process holds the GPU memory until one of the following two things happen:
+query finishes and user logs out of the Postgres client/session; or, 
+query finishes and user waits for the timeout set by `gp_vmem_idle_resource_timeout`.  
+The default value for this timeout is 18 sec [8].  So the recommendation is:
+log out/reconnect to the session after every GPU query; or
+wait for `gp_vmem_idle_resource_timeout` before you run another GPU query (you can 
+also set it to a lower value).
+
 @anchor keras_fit
 @par Fit
 The fit (training) function has the following format:
@@ -1381,10 +1392,12 @@ https://adalabucsd.github.io/papers/TR_2019_Cerebro.pdf
 Geoffrey Hinton with Nitish Srivastava and Kevin Swersky,
 http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
 
-[6] Deep learning section of Apache MADlib wiki, https://cwiki.apache.org/confluence/display/MADLIB/Deep+Learning
+[6] Deep learning section of Apache MADlib wiki https://cwiki.apache.org/confluence/display/MADLIB/Deep+Learning
 
 [7] Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville, MIT Press, 2016.
 
+[8] Greenplum Database server configuration parameters https://gpdb.docs.pivotal.io/latest/ref_guide/config_params/guc-list.html
+
 @anchor related
 @par Related Topics