You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bettadapura Srinath Sharma (JIRA)" <ji...@apache.org> on 2017/05/30 19:04:04 UTC

[jira] [Commented] (SPARK-20802) kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not normally distributed)

    [ https://issues.apache.org/jira/browse/SPARK-20802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029950#comment-16029950 ] 

Bettadapura Srinath Sharma commented on SPARK-20802:
----------------------------------------------------

In Java, (Correct behavior)
code:
KolmogorovSmirnovTestResult testResult = Statistics.kolmogorovSmirnovTest(col1, "norm", mean[1], stdDev[1]);
produces:
Kolmogorov-Smirnov test summary:
degrees of freedom = 0 
statistic = 0.005983051038968901 
pValue = 0.8643736171652615 
No presumption against null hypothesis: Sample follows theoretical distribution.


> kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws net.razorvine.pickle.PickleException when input data is normally distributed (no error when data is not normally distributed)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20802
>                 URL: https://issues.apache.org/jira/browse/SPARK-20802
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, PySpark
>    Affects Versions: 2.1.1
>         Environment: Linux version 4.4.14-smp
> x86/fpu: Legacy x87 FPU detected.
> using command line: 
> bash-4.3$ ./bin/spark-submit ~/work/python/Features.py
> bash-4.3$ pwd
> /home/bsrsharma/spark-2.1.1-bin-hadoop2.7
> export JAVA_HOME=/home/bsrsharma/jdk1.8.0_121
>            Reporter: Bettadapura Srinath Sharma
>
> In Scala,(correct behavior)
> code:
> testResult = Statistics.kolmogorovSmirnovTest(vecRDD, "norm", means(j), stdDev(j))
> produces:
> 17/05/18 10:52:53 INFO FeatureLogger: Kolmogorov-Smirnov test summary:
> degrees of freedom = 0 
> statistic = 0.005495681749849268 
> pValue = 0.9216108887428276 
> No presumption against null hypothesis: Sample follows theoretical distribution.
> in python (incorrect behavior):
> the code:
> testResult = Statistics.kolmogorovSmirnovTest(vecRDD, 'norm', numericMean[j], numericSD[j])
> causes this error:
> 17/05/17 21:59:23 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 14)
> net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org