You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ba...@apache.org on 2024/02/19 20:51:17 UTC

(systemds) branch main updated: [MINOR] Generate Python tSNE builtin

This is an automated email from the ASF dual-hosted git repository.

baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/main by this push:
     new 03ccaee6af [MINOR] Generate Python tSNE builtin
03ccaee6af is described below

commit 03ccaee6afc016d83c307734f6e0115f8ea22edf
Author: Sebastian Baunsgaard <ba...@apache.org>
AuthorDate: Mon Feb 19 21:51:04 2024 +0100

    [MINOR] Generate Python tSNE builtin
---
 src/main/python/systemds/operator/algorithm/builtin/tSNE.py | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/main/python/systemds/operator/algorithm/builtin/tSNE.py b/src/main/python/systemds/operator/algorithm/builtin/tSNE.py
index 3c659160c6..49eeee1a3a 100644
--- a/src/main/python/systemds/operator/algorithm/builtin/tSNE.py
+++ b/src/main/python/systemds/operator/algorithm/builtin/tSNE.py
@@ -35,6 +35,16 @@ def tSNE(X: Matrix,
      This function performs dimensionality reduction using tSNE algorithm based on
      the paper: Visualizing Data using t-SNE, Maaten et. al.
     
+     There exists a variant of t-SNE, implemented in sklearn, that first reduces the
+     dimenisonality of the data using PCA to reduce noise and then applies t-SNE for
+     further dimensionality reduction. A script of this can be found in the tutorials
+     folder: scripts/tutorials/tsne/pca-tsne.dml
+    
+     For direct reference and tips on choosing the dimension for the PCA pre-processing,
+     you can visit:
+     https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py
+     https://lvdmaaten.github.io/tsne/
+    
     
     
     :param X: Data Matrix of shape
@@ -44,9 +54,12 @@ def tSNE(X: Matrix,
     :param lr: Learning rate
     :param momentum: Momentum Parameter
     :param max_iter: Number of iterations
+    :param tol: Tolerance for early stopping in gradient descent
     :param seed: The seed used for initial values.
         If set to -1 random seeds are selected.
     :param is_verbose: Print debug information
+    :param print_iter: Intervals of printing out the L1 norm values. Parameter not relevant if
+        is_verbose = FALSE.
     :return: Data Matrix of shape (number of data points, reduced_dims)
     """