You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2014/01/01 02:48:50 UTC

[20/20] git commit: Merge pull request #73 from falaki/ApproximateDistinctCount

Merge pull request #73 from falaki/ApproximateDistinctCount

Approximate distinct count

Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/8b8e70eb
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/8b8e70eb
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/8b8e70eb

Branch: refs/heads/master
Commit: 8b8e70ebde880d08ebb3816b2f4003247559c7f8
Parents: 63b411d bee445c
Author: Reynold Xin <rx...@apache.org>
Authored: Tue Dec 31 17:48:24 2013 -0800
Committer: Reynold Xin <rx...@apache.org>
Committed: Tue Dec 31 17:48:24 2013 -0800

----------------------------------------------------------------------
 core/pom.xml                                    | 1582 +++++++++++++++---
 .../org/apache/spark/api/java/JavaPairRDD.scala |   36 +
 .../org/apache/spark/api/java/JavaRDDLike.scala |   11 +
 .../org/apache/spark/rdd/PairRDDFunctions.scala |   42 +
 .../main/scala/org/apache/spark/rdd/RDD.scala   |   16 +-
 .../spark/util/SerializableHyperLogLog.scala    |   50 +
 .../scala/org/apache/spark/JavaAPISuite.java    |   32 +
 .../spark/rdd/PairRDDFunctionsSuite.scala       |   34 +
 .../scala/org/apache/spark/rdd/RDDSuite.scala   |   13 +
 .../spark/serializer/KryoSerializerSuite.scala  |    4 +
 pom.xml                                         |    5 +
 project/SparkBuild.scala                        |    3 +-
 12 files changed, 1595 insertions(+), 233 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/8b8e70eb/core/src/test/scala/org/apache/spark/JavaAPISuite.java
----------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/8b8e70eb/pom.xml
----------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/8b8e70eb/project/SparkBuild.scala
----------------------------------------------------------------------