You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "turing.us" <tu...@gmail.com> on 2016/01/12 11:16:31 UTC
Maintain a state till the end of the application
Hi!
I have some application (skeleton):
val sc = new SparkContext($SOME_CONF)
val input = sc.textFile(inputFile)
val result = input.map(record => {
val myState = new MyState() // state
})
.filter($SOME_FILTER)
.sortBy($SOME_SORT)
.partitionBy(new HashPartitioner(100))
.reduceByKey((first,current) => current.updateRecord(first))
val report = result.map {x => x._2.generateReport(x._1)}.coalesce(1)
report.saveAsTextFile($SOME_OUTPUT)
// classes
case class AccumulatedState() {
...
}
case class MyState() {
var data: AccumulatedState = new AccumulatedState()
...
}
// EOC
I map the input by UUID,
Also, I have a custom state (MyState that holds AccumulatedState),
If the application run in a local mode, or on a real cluster with a single
reducer - the behavior is correct,
but, if I am trying to run it with a multi-reducers, the state was reset
somewhere in the processing.
How a shared state maintained in the lifecycle of the application?
I know that there are Accumulators & Broadcast variables, but those standing
for different use-case (counters & global static (like lookup tables)).
How can I have such shared state across the program till the end to generate
my results?
Thanks!!!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Maintain-a-state-till-the-end-of-the-application-tp25944.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org