You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vinod KC (JIRA)" <ji...@apache.org> on 2015/05/07 09:13:00 UTC

[jira] [Created] (SPARK-7438) Validation Error while running countApproxDistinct with relative accuracy >= 0.38

Vinod KC created SPARK-7438:
-------------------------------

             Summary: Validation Error while running  countApproxDistinct  with relative accuracy  >= 0.38  
                 Key: SPARK-7438
                 URL: https://issues.apache.org/jira/browse/SPARK-7438
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Vinod KC
            Priority: Minor


Eg Code: 
val a = sc.parallelize(1 to 10000, 20)
val b = a++a++a++a++a
b.countApproxDistinct(0.38)
"java.lang.IllegalArgumentException: requirement failed: p (3) must be at least 4"

Issue 1: When relative accuracy  >= 0.38, IAE is thrown, as the precision p evaluates to 3.
However,same input in countApproxDistinctByKey(0.38), works fine. Usage of relativeSD should be consistent in both countApproxDistinct and countApproxDistinctByKey
Issue 2: Validation error message "p (3) must be at least 4" is not giving a clue on what went wrong.
Issue 3: When relative accuracy < 0.000017, a proper validation error message is not shown from countApproxDistinct




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org