You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stefan Panayotov (JIRA)" <ji...@apache.org> on 2016/06/23 15:05:16 UTC

[jira] [Commented] (SPARK-16105) PCA Reverse Transformer

    [ https://issues.apache.org/jira/browse/SPARK-16105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346570#comment-15346570 ] 

Stefan Panayotov commented on SPARK-16105:
------------------------------------------

I understand that the 'reverse' operation is a projection of a 15 dimensional subspace into the 96 dimensional space; as in many data science applications the original higher dimensional space has domain specific meaning. The PCA model allows us to choose a 15 dimensional subspace which captures most of the variance in the 96 dimensional space. While in general the reverse transformation is not an 'inverse' operator in the sense that it is not a bijection, it does return the representation of the 15 dimensional vector in the 96 dimensional space. A well trained data scientist knows to inspect the impact of their dimensionality reduction in the domain specific coordinate system in order to better understand the implicit assumptions being imposed by their pipelines. This is accomplished by applying the reverse operation and comparing the results back to the vector valued column on which the PCA model was originally applied.

If there is any confusion, I can LaTeX out the mathematics for you.

> PCA Reverse Transformer
> -----------------------
>
>                 Key: SPARK-16105
>                 URL: https://issues.apache.org/jira/browse/SPARK-16105
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 1.6.1
>            Reporter: Stefan Panayotov
>            Priority: Minor
>
> The PCA class has a fit method that returns a PCAModel. One of the members of the PCAModel is a pc (Principal Components Matrix). This matrix is available for inspection, but there is no method to use this matrix for reverse transformation back to the original dimension. For example, if I use the PCA to reduce dimensionality of my space from 96 to 15, I get a 96x15 pc Matrix. I can do some modeling in my reduced space and then I need to  reverse back to the original 96 dimensional space. Basically, I need to multiply my 15 dimensional vectors by the 96x15 pc Matrix to get back 96 dimensional vectors. Such method is missing from the PCA model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org