You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Ruebenacker, Oliver A" <Ol...@altisource.com> on 2014/09/04 22:15:01 UTC

Reduce truncates RDD in standalone, but fine when local.

     Hello,

  In the app below, when I run it with local[1] or local [3], I get the expected result - a list of the square roots of the numbers from 1 to 20.
 When I try the same app as standalone with one or two workers on the same machine, it will only print 1.0.
  Adding print statements into the reduce function reveals that three times it calculated Set(1.0) ++ Set(1.0) to yield Set(1.0).
  Any ideas? Thanks!

     Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
Oliver.Ruebenacker@Altisource.com<ma...@Altisource.com> | www.Altisource.com

package sandbox

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object SquareRoots extends App {

  def sqrt(x: Double, nIters: Long) = {
    var iIter: Long = 0
    var root = 1.0
    while (iIter < nIters) {
      iIter += 1
      root = 0.5 * (root + x / root)
    }
    root
  }

  def format(x:Double) :String = {
    val string = "" + x
    if(string.length > 5) { string.substring(0,5) } else { string }
  }

  val nNums = 20
  val nIters = 1000000000 // for 1e9, runs about 50-55 secs per stage on my laptop with --master local[4]
  val nStages = 10

  System.setProperty("hadoop.home.dir", "c:\\Users\\ruebenac\\winutil\\")

  val conf = new SparkConf().setAppName("Square roots")
  val sc = new SparkContext(conf)

  val logPrefix = " [###] "

  def log(line: String) = { println(logPrefix + line) }

  log("Let's go!")
  for (iStage <- 0 to nStages) {
    log("Starting stage " + iStage)
    val nums = sc.parallelize((1 to nNums).map(_.toDouble))
    val roots = nums.map(sqrt(_, nIters)).map(Set(_)).reduce((roots1, roots2) => roots1 ++ roots2).toList.sorted
    log("Square roots from 1 to " + nNums + " in " + nIters + " iterations:")
    log(roots.map(format(_)).mkString(" "))
    log("Completed stage " + iStage)
  }
  log("Done!")
}
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

RE: Reduce truncates RDD in standalone, but fine when local.

Posted by "Ruebenacker, Oliver A" <Ol...@altisource.com>.
     Hello,

  I tracked it down to the field nIters being uninitialized when passed to the reduce job while running standalone, but initialized when running local. Must be some strange interaction between Spark and scala.App. If I move the reduce job into a method and make nIters a local field, it works fine.

     Best, Oliver


From: Ruebenacker, Oliver A [mailto:Oliver.Ruebenacker@altisource.com]
Sent: Thursday, September 04, 2014 4:15 PM
To: user@spark.apache.org
Subject: Reduce truncates RDD in standalone, but fine when local.


     Hello,

  In the app below, when I run it with local[1] or local [3], I get the expected result - a list of the square roots of the numbers from 1 to 20.
 When I try the same app as standalone with one or two workers on the same machine, it will only print 1.0.
  Adding print statements into the reduce function reveals that three times it calculated Set(1.0) ++ Set(1.0) to yield Set(1.0).
  Any ideas? Thanks!

     Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
Oliver.Ruebenacker@Altisource.com<ma...@Altisource.com> | www.Altisource.com

package sandbox

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object SquareRoots extends App {

  def sqrt(x: Double, nIters: Long) = {
    var iIter: Long = 0
    var root = 1.0
    while (iIter < nIters) {
      iIter += 1
      root = 0.5 * (root + x / root)
    }
    root
  }

  def format(x:Double) :String = {
    val string = "" + x
    if(string.length > 5) { string.substring(0,5) } else { string }
  }

  val nNums = 20
  val nIters = 1000000000 // for 1e9, runs about 50-55 secs per stage on my laptop with --master local[4]
  val nStages = 10

  System.setProperty("hadoop.home.dir", "c:\\Users\\ruebenac\\winutil\\")

  val conf = new SparkConf().setAppName("Square roots")
  val sc = new SparkContext(conf)

  val logPrefix = " [###] "

  def log(line: String) = { println(logPrefix + line) }

  log("Let's go!")
  for (iStage <- 0 to nStages) {
    log("Starting stage " + iStage)
    val nums = sc.parallelize((1 to nNums).map(_.toDouble))
    val roots = nums.map(sqrt(_, nIters)).map(Set(_)).reduce((roots1, roots2) => roots1 ++ roots2).toList.sorted
    log("Square roots from 1 to " + nNums + " in " + nIters + " iterations:")
    log(roots.map(format(_)).mkString(" "))
    log("Completed stage " + iStage)
  }
  log("Done!")
}
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************
***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************