You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Teng Peng (JIRA)" <ji...@apache.org> on 2017/11/06 03:16:00 UTC

[jira] [Commented] (SPARK-20077) Documentation for ml.stats.Correlation

    [ https://issues.apache.org/jira/browse/SPARK-20077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239846#comment-16239846 ] 

Teng Peng commented on SPARK-20077:
-----------------------------------

[~srowen] On this pagehttps://spark.apache.org/docs/latest/ml-statistics.html, we have Pearson and Spearman coefficients. Just want to make sure: Maybe we need something other than this?

Correlation computes the correlation matrix for the input Dataset of Vectors using the specified method. The output will be a DataFrame that contains the correlation matrix of the column of vectors.

import org.apache.spark.ml.linalg.{Matrix, Vectors}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row

val data = Seq(
  Vectors.sparse(4, Seq((0, 1.0), (3, -2.0))),
  Vectors.dense(4.0, 5.0, 0.0, 3.0),
  Vectors.dense(6.0, 7.0, 0.0, 8.0),
  Vectors.sparse(4, Seq((0, 9.0), (3, 1.0)))
)

val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println("Pearson correlation matrix:\n" + coeff1.toString)

val Row(coeff2: Matrix) = Correlation.corr(df, "features", "spearman").head
println("Spearman correlation matrix:\n" + coeff2.toString)



> Documentation for ml.stats.Correlation
> --------------------------------------
>
>                 Key: SPARK-20077
>                 URL: https://issues.apache.org/jira/browse/SPARK-20077
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Timothy Hunter
>            Priority: Minor
>
> Now that (Pearson) correlations are available in spark.ml, we need to write some documentation to go along with this feature. It can simply be looking at the unit tests for example right now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org