You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2014/06/10 10:35:01 UTC
[jira] [Created] (SPARK-2094) Ensure exactly once semantics for DDL
/ Commands
Michael Armbrust created SPARK-2094:
---------------------------------------
Summary: Ensure exactly once semantics for DDL / Commands
Key: SPARK-2094
URL: https://issues.apache.org/jira/browse/SPARK-2094
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Michael Armbrust
Fix For: 1.1.0
>From [~lian cheng]...
The constraints presented here are:
* The side effect of a command SchemaRDD should take place eagerly;
* The side effect of a command SchemaRDD should take place once and only once;
* When .collect() method is called, something meaningful, usually the output message lines of the command, should be presented.
Then how about adding a lazy field inside all the physical command nodes to wrap up the side effect and hold the command output? Take the SetCommandPhysical as an example:
{code}
trait PhysicalCommand(@transient context: SQLContext) {
lazy val commandOutput: Any
}
case class SetCommandPhysical(
key: Option[String], value: Option[String], output: Seq[Attribute])(
@transient context: SQLContext)
extends PhysicalCommand(context)
with PhysicalCommand {
override lazy val commandOutput = {
// Perform the side effect, and record appropriate output
???
}
def execute(): RDD[Row] = {
val row = new GenericRow(Array[Any](commandOutput))
context.sparkContext.parallelize(row, 1)
}
}
{code}
In this way, all the constraints are met:
* Eager evaluation: done by the toRdd call in SchemaRDDLike (PR #948),
* Side effect should take place once and only once: ensured by the lazy commandOutput field,
* Present meaningful output as RDD contents: command output is held by commandOutput and returned in execute().
An additional benefit is that, side effect logic of all the commands can be implemented within their own physical command nodes, instead of adding special cases inside SQLContext.toRdd and/or HiveContext.toRdd.
--
This message was sent by Atlassian JIRA
(v6.2#6252)