You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/11/08 11:15:33 UTC

[jira] [Resolved] (SPARK-1196) val variables not available within RDD map on cluster app; are on shell or local

     [ https://issues.apache.org/jira/browse/SPARK-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-1196.
------------------------------
    Resolution: Cannot Reproduce

> val variables not available within RDD map on cluster app; are on shell or local
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-1196
>                 URL: https://issues.apache.org/jira/browse/SPARK-1196
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Andrew Kerr
>
> When this code
> {code}
> def foo = "foo"
>   val bar = "bar"
>   val data = sc.parallelize(Seq("a"))
>   data.map{a => print(1,foo,bar);a}.map{a => print(2,foo,bar);a}.map{a => print(3,foo,bar);a}.collect()
> {code}
> is run on a cluster on the spark shell a slave's stdout is
> {code}
> (1,foo,bar)(2,foo,bar)(3,foo,bar)
> {code}
> as expected.
> However when the code
> {code}
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.SparkContext._
> object twitterAggregation extends App {
>   val conf = new SparkConf()
>     .setMaster("spark://xx.compute-1.amazonaws.com:7077")
>     .setAppName("testCase")
>     .setJars(List("target/scala-2.10/spark-test-case_2.10-1.0.jar"))
>     .setSparkHome("/root/spark/")
>   val sc = new SparkContext(conf)
>   def foo = "foo"
>   val bar = "bar"
>   val data = sc.parallelize(Seq("a"))
>   data.map{a => print(1,foo,bar);a}.map{a => print(2,foo,bar);a}.map{a => print(3,foo,bar);a}.collect()
> }
> {code}
> is run against a cluster as an application via sbt the stdout on a slave is
> {code}
> (1,foo,null)(2,foo,null)(3,foo,null)
> {code}
> The variable declared with val is now null when the anon functions in the map are executed.
> When the application is run in local mode the output is 
> {code}
> (1,foo,bar)(2,foo,bar)(3,foo,bar)
> {code}
> as wanted.
> build.sbt is 
> {code}
> name := "spark-test-case"
> version := "1.0"
> scalaVersion:="2.10.3"
> resolvers ++= Seq("Akka Repository" at "http://repo.akka.io/releases/")
> libraryDependencies ++= Seq("org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating")
> {code}
> To avoid firewall and NAT issues the project directory is rsynced onto the master where is is build with SBT 0.13.1
> {code}
> wget http://repo.scala-sbt.org/scalasbt/sbt-native-packages/org/scala-sbt/sbt/0.13.1/sbt.rpm && rpm --install sbt.rpm
> sbt package && sbt run
> {code}
> Cluster created with scripts in the hadoop2 0.9.0 download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org