You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/01/24 13:00:38 UTC

[jira] [Resolved] (SPARK-3430) Introduce ValueIncrementableHashMapAccumulator to compute Histogram and other statistical metrics

     [ https://issues.apache.org/jira/browse/SPARK-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-3430.
------------------------------
    Resolution: Won't Fix

PR says this is WontFix

> Introduce ValueIncrementableHashMapAccumulator to compute Histogram and other statistical metrics
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3430
>                 URL: https://issues.apache.org/jira/browse/SPARK-3430
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Suraj Satishkumar Sheth
>
> Pull request : https://github.com/apache/spark/pull/2314
> Currently, we don't have a Hash map which can be used as an accumulator to produce Histogram or distribution. This class will provide a customized HashMap implemetation whose value can be incremented.
> e.g. map+=(a,1), map+=(a,6) will lead to (a,7)
> This can have various applications like computation of Histograms, Sampling Strategy generation, Statistical metric computation, in MLLib, etc.
> Example usage :
>     val map  = sc.accumulableCollection(new ValueIncrementableHashMapAccumulator[Int]())
>     
>     var countMap = sc.broadcast(map)
>     
>     data.foreach(record => {
>       var valArray = record.split("\t")
>       var valString = ""
>       var i = 0
>       var tuple = (0,1L)
>       countMap.value += tuple
>       for(valString <- valArray) {
>         i = i+1
>         try{
>           valString.toDouble
>           var tuple = (i,1L)
>           countMap.value += tuple
>         }
>         catch {
>           case ioe: Exception => None
>         }
>         
>       }
>     })



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org