You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jose Cambronero (JIRA)" <ji...@apache.org> on 2015/06/24 21:39:04 UTC
[jira] [Created] (SPARK-8598) Implementation of 1-sample,
two-sided, Kolmogorov Smirnov Test for RDDs
Jose Cambronero created SPARK-8598:
--------------------------------------
Summary: Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs
Key: SPARK-8598
URL: https://issues.apache.org/jira/browse/SPARK-8598
Project: Spark
Issue Type: New Feature
Components: MLlib
Reporter: Jose Cambronero
Priority: Minor
We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov test, which tests the null hypothesis that the sample comes from a given continuous distribution. We provide various functions to access the functionality: namely, a function that takes an RDD[Double] of the data and a lambda to calculate the CDF, a function that takes an RDD[Double] and an Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition to provide an optimized way to perform the calculation when the CDF calculation requires a non-serializable object (e.g. the apache math commons real distributions), and finally a function that takes an RDD[Double] and a String name of the theoretical distribution to be used. The appropriate result class has been added, as well as tests to the HypothesisTestSuite
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org