You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Suraj Satishkumar Sheth (JIRA)" <ji...@apache.org> on 2014/09/07 19:29:28 UTC
[jira] [Updated] (SPARK-3430) Introduce
ValueIncrementableHashMapAccumulator to compute Histogram and other
statistical metrics
[ https://issues.apache.org/jira/browse/SPARK-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suraj Satishkumar Sheth updated SPARK-3430:
-------------------------------------------
Summary: Introduce ValueIncrementableHashMapAccumulator to compute Histogram and other statistical metrics (was: Introduce ValueIncrementableHashMapAccumulator to compute Histogram)
> Introduce ValueIncrementableHashMapAccumulator to compute Histogram and other statistical metrics
> -------------------------------------------------------------------------------------------------
>
> Key: SPARK-3430
> URL: https://issues.apache.org/jira/browse/SPARK-3430
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Suraj Satishkumar Sheth
>
> Pull request : https://github.com/apache/spark/pull/2314
> Currently, we don't have a Hash map which can be used as an accumulator to produce Histogram or distribution. This class will provide a customized HashMap implemetation whose value can be incremented.
> e.g. map+=(a,1), map+=(a,6) will lead to (a,7)
> This can have various applications like computation of Histograms, Sampling Strategy generation, Statistical metric computation, in MLLib, etc.
> Example usage :
> val map = sc.accumulableCollection(new ValueIncrementableHashMapAccumulator[Int]())
>
> var countMap = sc.broadcast(map)
>
> data.foreach(record => {
> var valArray = record.split("\t")
> var valString = ""
> var i = 0
> var tuple = (0,1L)
> countMap.value += tuple
> for(valString <- valArray) {
> i = i+1
> try{
> valString.toDouble
> var tuple = (i,1L)
> countMap.value += tuple
> }
> catch {
> case ioe: Exception => None
> }
>
> }
> })
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org