You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Edoardo Vacchi (JIRA)" <ji...@apache.org> on 2015/05/20 12:07:00 UTC
[jira] [Updated] (SPARK-7754) Use PartialFunction literals instead
of objects in Catalyst
[ https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edoardo Vacchi updated SPARK-7754:
----------------------------------
Description:
Catalyst rules extend two distinct "rule" types: {{Rule[LogicalPlan]}} and {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
The distinction is fairly subtle: in the end, both rule types are supposed to define a method {{apply(plan: LogicalPlan)}}
(where LogicalPlan is either Logical- or Spark-) which returns a transformed plan (or a sequence thereof, in the case
of Strategy).
Ceremonies asides, the body of such method is always of the kind:
{code:java}
def apply(plan: PlanType) = plan match pf
{code}
where `pf` would be some `PartialFunction` of the PlanType:
{code:java}
val pf = {
case ... => ...
}
{code}
This is JIRA is a proposal to introduce utility methods to
a) reduce the boilerplate to define rewrite rules
b) turning them back into what they essentially represent: function types.
These changes would be backwards compatible, and would greatly help in understanding what the code does. Current use of objects is redundant and possibly confusing.
*{{Rule[LogicalPlan]}}*
a) Introduce the utility object
{code:java}
object rule
def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
new Rule[LogicalPlan] {
def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
}
def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
new Rule[LogicalPlan] {
override val ruleName = name
def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
}
{code}
b) progressively replace the boilerplate-y object definitions; e.g.
{code:java}
object MyRewriteRule extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case ... => ...
}
{code}
with
{code:java}
// define a Rule[LogicalPlan]
val MyRewriteRule = rule {
case ... => ...
}
{code}
and/or :
{code:java}
// define a named Rule[LogicalPlan]
val MyRewriteRule = rule.named("My rewrite rule") {
case ... => ...
}
{code}
*Strategies*
A similar solution could be applied to shorten the code for
Strategies, which are total functions
only because they are all supposed to manage the default case,
possibly returning `Nil`. In this case
we might introduce the following utility:
{code:java}
object strategy {
/**
* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
* The partial function must therefore return *one single* SparkPlan for each case.
* The method will automatically wrap them in a [[Seq]].
* Unhandled cases will automatically return Seq.empty
*/
protected def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
new Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] =
if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
}
/**
* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ].
* The partial function must therefore return a Seq[SparkPlan] for each case.
* Unhandled cases will automatically return Seq.empty
*/
protected def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
new Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] =
if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
}
}
{code}
Usage:
{code:java}
val mystrategy = strategy { case ... => ... }
val seqstrategy = strategy.seq { case ... => ... }
{code}
*Further possible improvements:*
Making the utility methods `implicit`, thereby
further reducing the rewrite rules to:
{code:java}
// define a PartialFunction[LogicalPlan, LogicalPlan]
// the implicit would convert it into a Rule[LogicalPlan] at the use sites
val MyRewriteRule = {
case ... => ...
}
{code}
*Caveats*
Because of the way objects are initialized vs. vals, it might be necessary
reorder instructions so that vals are actually initialized before they are used.
E.g.:
{code:java}
class MyOptimizer extends Optimizer {
override val batches: Seq[Batch] =
...
Batch("Other rules", FixedPoint(100),
MyRewriteRule // <--- might throw NPE
val MyRewriteRule = ...
}
{code}
this is easily fixed by factoring rules out as follows:
{code:java}
class MyOptimizer extends Optimizer with CustomRules {
val MyRewriteRule = ...
override val batches: Seq[Batch] =
...
Batch("Other rules", FixedPoint(100),
MyRewriteRule
}
{code}
or, even better, further modularizing, e.g using traits or container objects:
{code:java}
class MyOptimizer extends Optimizer with CustomRules {
override val batches: Seq[Batch] =
...
Batch("Other rules", FixedPoint(100),
MyRewriteRule
}
trait CustomRules {
val MyRewriteRule = ...
}
{code}
was:
Catalyst rules extend two distinct "rule" types:
{{Rule[LogicalPlan]}} and {{Strategy}} (which is an alias for
{{GenericStrategy[SparkPlan]}}).
The distinction is fairly subtle: in the end, both rule types
are supposed to define a method {{apply(plan: LogicalPlan)}}
(where LogicalPlan is either Logical- or Spark-)
which returns a transformed plan (or a sequence thereof, in the case
of Strategy).
Ceremonies asides, the body of such method is always of the kind:
{code:java}
def apply(plan: PlanType) = plan match pf
{code}
where `pf` would be some `PartialFunction` of the PlanType:
{code:java}
val pf = {
case ... => ...
}
{code}
This is JIRA is a proposal to introduce utility methods to
a) reduce the boilerplate to define rewrite rules
b) turning them back into what they essentially represent: function types.
These changes would be backwards compatible, and would greatly help in
understanding what the code does. Current use of objects is redundant
and possibly confusing.
**{{Rule[LogicalPlan]}}**
a) Introduce the utility object
{code:java}
object rule
def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
new Rule[LogicalPlan] {
def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
}
def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
new Rule[LogicalPlan] {
override val ruleName = name
def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
}
{code}
b) progressively replace the boilerplate-y object definitions; e.g.
{code:java}
object MyRewriteRule extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case ... => ...
}
{code}
with
{code:java}
// define a Rule[LogicalPlan]
val MyRewriteRule = rule {
case ... => ...
}
{code}
and/or :
{code:java}
// define a named Rule[LogicalPlan]
val MyRewriteRule = rule.named("My rewrite rule") {
case ... => ...
}
{code}
## Strategies
A similar solution could be applied to shorten the code for
Strategies, which are total functions
only because they are all supposed to manage the default case,
possibly returning `Nil`. In this case
we might introduce the following utility:
{code:java}
object strategy {
/**
* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
* The partial function must therefore return *one single* SparkPlan for each case.
* The method will automatically wrap them in a [[Seq]].
* Unhandled cases will automatically return Seq.empty
*/
protected def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
new Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] =
if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
}
/**
* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ].
* The partial function must therefore return a Seq[SparkPlan] for each case.
* Unhandled cases will automatically return Seq.empty
*/
protected def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
new Strategy {
def apply(plan: LogicalPlan): Seq[SparkPlan] =
if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
}
}
{code}
Usage:
{code:java}
val mystrategy = strategy { case ... => ... }
val seqstrategy = strategy.seq { case ... => ... }
{code}
**Further possible improvements:**
Making the utility methods `implicit`, thereby
further reducing the rewrite rules to:
{code:java}
// define a PartialFunction[LogicalPlan, LogicalPlan]
// the implicit would convert it into a Rule[LogicalPlan] at the use sites
val MyRewriteRule = {
case ... => ...
}
{code}
**Caveats**
Because of the way objects are initialized vs. vals, it might be necessary
reorder instructions so that vals are actually initialized before they are used.
E.g.:
{code:java}
class MyOptimizer extends Optimizer {
override val batches: Seq[Batch] =
...
Batch("Other rules", FixedPoint(100),
MyRewriteRule // <--- might throw NPE
val MyRewriteRule = ...
}
{code}
this is easily fixed by factoring rules out as follows:
{code:java}
class MyOptimizer extends Optimizer with CustomRules {
val MyRewriteRule = ...
override val batches: Seq[Batch] =
...
Batch("Other rules", FixedPoint(100),
MyRewriteRule
}
{code}
or, even better, further modularizing, e.g using traits or container objects:
{code:java}
class MyOptimizer extends Optimizer with CustomRules {
override val batches: Seq[Batch] =
...
Batch("Other rules", FixedPoint(100),
MyRewriteRule
}
trait CustomRules {
val MyRewriteRule = ...
}
{code}
> Use PartialFunction literals instead of objects in Catalyst
> -----------------------------------------------------------
>
> Key: SPARK-7754
> URL: https://issues.apache.org/jira/browse/SPARK-7754
> Project: Spark
> Issue Type: Improvement
> Reporter: Edoardo Vacchi
>
> Catalyst rules extend two distinct "rule" types: {{Rule[LogicalPlan]}} and {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
> The distinction is fairly subtle: in the end, both rule types are supposed to define a method {{apply(plan: LogicalPlan)}}
> (where LogicalPlan is either Logical- or Spark-) which returns a transformed plan (or a sequence thereof, in the case
> of Strategy).
> Ceremonies asides, the body of such method is always of the kind:
> {code:java}
> def apply(plan: PlanType) = plan match pf
> {code}
> where `pf` would be some `PartialFunction` of the PlanType:
> {code:java}
> val pf = {
> case ... => ...
> }
> {code}
> This is JIRA is a proposal to introduce utility methods to
> a) reduce the boilerplate to define rewrite rules
> b) turning them back into what they essentially represent: function types.
> These changes would be backwards compatible, and would greatly help in understanding what the code does. Current use of objects is redundant and possibly confusing.
> *{{Rule[LogicalPlan]}}*
> a) Introduce the utility object
> {code:java}
> object rule
> def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
> new Rule[LogicalPlan] {
> def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
> }
> def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
> new Rule[LogicalPlan] {
> override val ruleName = name
> def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
> }
> {code}
> b) progressively replace the boilerplate-y object definitions; e.g.
> {code:java}
> object MyRewriteRule extends Rule[LogicalPlan] {
> def apply(plan: LogicalPlan): LogicalPlan = plan transform {
> case ... => ...
> }
> {code}
> with
> {code:java}
> // define a Rule[LogicalPlan]
> val MyRewriteRule = rule {
> case ... => ...
> }
> {code}
> and/or :
> {code:java}
> // define a named Rule[LogicalPlan]
> val MyRewriteRule = rule.named("My rewrite rule") {
> case ... => ...
> }
> {code}
> *Strategies*
> A similar solution could be applied to shorten the code for
> Strategies, which are total functions
> only because they are all supposed to manage the default case,
> possibly returning `Nil`. In this case
> we might introduce the following utility:
> {code:java}
> object strategy {
> /**
> * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
> * The partial function must therefore return *one single* SparkPlan for each case.
> * The method will automatically wrap them in a [[Seq]].
> * Unhandled cases will automatically return Seq.empty
> */
> protected def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
> new Strategy {
> def apply(plan: LogicalPlan): Seq[SparkPlan] =
> if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
> }
> /**
> * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ].
> * The partial function must therefore return a Seq[SparkPlan] for each case.
> * Unhandled cases will automatically return Seq.empty
> */
> protected def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
> new Strategy {
> def apply(plan: LogicalPlan): Seq[SparkPlan] =
> if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
> }
> }
> {code}
> Usage:
> {code:java}
> val mystrategy = strategy { case ... => ... }
> val seqstrategy = strategy.seq { case ... => ... }
> {code}
> *Further possible improvements:*
> Making the utility methods `implicit`, thereby
> further reducing the rewrite rules to:
> {code:java}
> // define a PartialFunction[LogicalPlan, LogicalPlan]
> // the implicit would convert it into a Rule[LogicalPlan] at the use sites
> val MyRewriteRule = {
> case ... => ...
> }
> {code}
> *Caveats*
> Because of the way objects are initialized vs. vals, it might be necessary
> reorder instructions so that vals are actually initialized before they are used.
> E.g.:
> {code:java}
> class MyOptimizer extends Optimizer {
> override val batches: Seq[Batch] =
> ...
> Batch("Other rules", FixedPoint(100),
> MyRewriteRule // <--- might throw NPE
> val MyRewriteRule = ...
> }
> {code}
> this is easily fixed by factoring rules out as follows:
> {code:java}
> class MyOptimizer extends Optimizer with CustomRules {
> val MyRewriteRule = ...
> override val batches: Seq[Batch] =
> ...
> Batch("Other rules", FixedPoint(100),
> MyRewriteRule
> }
> {code}
> or, even better, further modularizing, e.g using traits or container objects:
> {code:java}
> class MyOptimizer extends Optimizer with CustomRules {
> override val batches: Seq[Batch] =
> ...
> Batch("Other rules", FixedPoint(100),
> MyRewriteRule
> }
> trait CustomRules {
> val MyRewriteRule = ...
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org