You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ba...@apache.org on 2024/02/19 20:51:17 UTC
(systemds) branch main updated: [MINOR] Generate Python tSNE builtin
This is an automated email from the ASF dual-hosted git repository.
baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/main by this push:
new 03ccaee6af [MINOR] Generate Python tSNE builtin
03ccaee6af is described below
commit 03ccaee6afc016d83c307734f6e0115f8ea22edf
Author: Sebastian Baunsgaard <ba...@apache.org>
AuthorDate: Mon Feb 19 21:51:04 2024 +0100
[MINOR] Generate Python tSNE builtin
---
src/main/python/systemds/operator/algorithm/builtin/tSNE.py | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/main/python/systemds/operator/algorithm/builtin/tSNE.py b/src/main/python/systemds/operator/algorithm/builtin/tSNE.py
index 3c659160c6..49eeee1a3a 100644
--- a/src/main/python/systemds/operator/algorithm/builtin/tSNE.py
+++ b/src/main/python/systemds/operator/algorithm/builtin/tSNE.py
@@ -35,6 +35,16 @@ def tSNE(X: Matrix,
This function performs dimensionality reduction using tSNE algorithm based on
the paper: Visualizing Data using t-SNE, Maaten et. al.
+ There exists a variant of t-SNE, implemented in sklearn, that first reduces the
+ dimenisonality of the data using PCA to reduce noise and then applies t-SNE for
+ further dimensionality reduction. A script of this can be found in the tutorials
+ folder: scripts/tutorials/tsne/pca-tsne.dml
+
+ For direct reference and tips on choosing the dimension for the PCA pre-processing,
+ you can visit:
+ https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py
+ https://lvdmaaten.github.io/tsne/
+
:param X: Data Matrix of shape
@@ -44,9 +54,12 @@ def tSNE(X: Matrix,
:param lr: Learning rate
:param momentum: Momentum Parameter
:param max_iter: Number of iterations
+ :param tol: Tolerance for early stopping in gradient descent
:param seed: The seed used for initial values.
If set to -1 random seeds are selected.
:param is_verbose: Print debug information
+ :param print_iter: Intervals of printing out the L1 norm values. Parameter not relevant if
+ is_verbose = FALSE.
:return: Data Matrix of shape (number of data points, reduced_dims)
"""