You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Corey Stubbs <cs...@us.ibm.com> on 2015/07/09 18:52:20 UTC

orderBy + cache is invoking work on mesos cluster

Spark Version: 1.3.1
Cluster: Mesos 0.22.0
Scala Version: 2.10.4

I am seeing work done on my cluster when invoking cache on an rdd. I would
have expected the last line of the code below to not invoke any cluster
work. Is there some condition where cache will do cluster work?


val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
// work is done to load the json into the dataframe
val people = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil
)
val peoplDF = sqlContext.jsonRDD(people).toDF()
// No work is done for the orderBy, as expected
val orderBy = peoplDF.orderBy("name")
// Jobs are run when invoking cache, expectation was nothing would run on
the cluster
val orderByCache = orderBy.cache




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/orderBy-cache-is-invoking-work-on-mesos-cluster-tp23749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org