You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by debasish83 <gi...@git.apache.org> on 2016/06/05 18:49:54 UTC

[GitHub] spark issue #1110: [SPARK-2174][MLLIB] treeReduce and treeAggregate

Github user debasish83 commented on the issue:

    https://github.com/apache/spark/pull/1110
  
    @mengxr say I have 20 nodes and 16 cores on each node, do you recommend running treeReduce with 320 partitions and OpenBLAS with numThreads=1 on each partition for SeqOp OR treeReduce with 20 partitions and OpenBLAS with numThreads=16 on each partition for SeqOp...Do you have plans on further improvement ideas of decreasing network shuffle using treeReduce/treeAggregate or if there is a JIRA open so that we can move the discussion on it ? Looks like shuffle is compressed by default on Spark using snappy already...do you recommend compressing the vector logically ?
    
    SparkContext: 20 nodes, 16 cores, sc.defaultParallelism 320
    
    def gramSize(n: Int) = (n*n+1)/2
    
    val combOp = (v1: Array[Float], v2: Array[Float]) => {
          var i = 0
          while (i < v1.length) {
            v1(i) += v2(i)
            i += 1
          }
          v1
        }
    
    val n = gramSize(4096)
    val vv = sc.parallelize(0 until sc.defaultParallelism).map(i => Array.fill[Float](n)(0))
    
    Option 1: 320 partitions, 1 thread on combOp per partition
    
    val start = System.nanoTime(); 
    vv.treeReduce(combOp, 2); 
    val reduceTime = (System.nanoTime() - start)*1e-9
    reduceTime: Double = 5.6390302430000006
    
    Option 2: 20 partitions, 1 thread on combOp per partition
    
    val coalescedvv = vv.coalesce(20)
    coalescedvv.count
    
    val start = System.nanoTime(); 
    coalescedvv.treeReduce(combOp, 2); 
    val reduceTime = (System.nanoTime() - start)*1e-9
    reduceTime: Double = 3.9140685640000004
    
    Option 3: 20 partitions, OpenBLAS numThread=16 per partition
    
    Setting up OpenBLAS on cluster, I will update soon.
    
    Let me know your thoughts. I think if underlying operations are Dense BLAS level1, level2 or level3, running with higher OpenBLAS threads and reducing number of partitions should help in decreasing cross partition shuffle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org