You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Edoardo Vacchi (JIRA)" <ji...@apache.org> on 2015/05/20 12:06:00 UTC

[jira] [Created] (SPARK-7754) Use PartialFunction literals instead of objects in Catalyst

Edoardo Vacchi created SPARK-7754:
-------------------------------------

             Summary: Use PartialFunction literals instead of objects in Catalyst
                 Key: SPARK-7754
                 URL: https://issues.apache.org/jira/browse/SPARK-7754
             Project: Spark
          Issue Type: Improvement
            Reporter: Edoardo Vacchi


Catalyst rules extend two distinct "rule" types:
{{Rule[LogicalPlan]}} and {{Strategy}} (which is an alias for
{{GenericStrategy[SparkPlan]}}).

The distinction is fairly subtle: in the end, both rule types
are supposed to define a method {{apply(plan: LogicalPlan)}}
(where LogicalPlan is either Logical- or Spark-)
which returns a transformed plan (or a sequence thereof, in the case
of Strategy).

Ceremonies asides, the body of such method is always of the kind:

{code:java}
     def apply(plan: PlanType) = plan match pf
{code}

where `pf` would be some `PartialFunction` of the PlanType:

{code:java}
      val pf = {
        case ... => ...
      }
{code}

This is JIRA is a proposal to introduce utility methods to

  a) reduce the boilerplate to define rewrite rules
  b) turning them back into what they essentially represent: function types.

These changes would be backwards compatible, and would greatly help in
understanding what the code does. Current use of objects is redundant 
and possibly confusing.

**{{Rule[LogicalPlan]}}**

a) Introduce the utility object


{code:java}
  object rule 
    def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
      new Rule[LogicalPlan] {
        def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
      }
    def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
      new Rule[LogicalPlan] {
        override val ruleName = name
        def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
      }

{code}


b) progressively replace the boilerplate-y object definitions; e.g.

{code:java}
    object MyRewriteRule extends Rule[LogicalPlan] {
      def apply(plan: LogicalPlan): LogicalPlan = plan transform {
        case ... => ...
    }

{code}

with


{code:java}
    // define a Rule[LogicalPlan]
    val MyRewriteRule = rule {
      case ... => ...
    }

{code}

and/or :


{code:java}

    // define a named Rule[LogicalPlan]
    val MyRewriteRule = rule.named("My rewrite rule") {
      case ... => ...
    }


{code}

## Strategies

A similar solution could be applied to shorten the code for
Strategies, which are total functions
only because they are all supposed to manage the default case,
possibly returning `Nil`. In this case
we might introduce the following utility:


{code:java}
object strategy {

  /**
   * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
   * The partial function must therefore return *one single* SparkPlan for each case.
   * The method will automatically wrap them in a [[Seq]].
   * Unhandled cases will automatically return Seq.empty
   */
  protected def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
    new Strategy {
      def apply(plan: LogicalPlan): Seq[SparkPlan] =
        if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
    }

  /**
   * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ].
   * The partial function must therefore return a Seq[SparkPlan] for each case.
   * Unhandled cases will automatically return Seq.empty
   */
  protected def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
    new Strategy {
      def apply(plan: LogicalPlan): Seq[SparkPlan] =
        if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
    }
}

{code}

Usage:



{code:java}
val mystrategy = strategy { case ... => ... }
val seqstrategy = strategy.seq { case ... => ... }

{code}

**Further possible improvements:**


Making the utility methods `implicit`, thereby
further reducing the rewrite rules to:


{code:java}
    // define a PartialFunction[LogicalPlan, LogicalPlan]
    // the implicit would convert it into a Rule[LogicalPlan] at the use sites
    val MyRewriteRule = {
      case ... => ...
    }


{code}

**Caveats**

Because of the way objects are initialized vs. vals, it might be necessary
reorder instructions so that vals are actually initialized before they are used.
E.g.:

{code:java}
class MyOptimizer extends Optimizer {
  override val batches: Seq[Batch] =
      ...
      Batch("Other rules", FixedPoint(100),
        MyRewriteRule // <--- might throw NPE

  val MyRewriteRule = ...
}

{code}

this is easily fixed by factoring rules out as follows:



{code:java}
class MyOptimizer extends Optimizer with CustomRules {
  val MyRewriteRule = ...
  override val batches: Seq[Batch] =
      ...
      Batch("Other rules", FixedPoint(100),
        MyRewriteRule 
}

{code}

or, even better, further modularizing, e.g using traits or container objects:


{code:java}
class MyOptimizer extends Optimizer with CustomRules {
  override val batches: Seq[Batch] =
      ...
      Batch("Other rules", FixedPoint(100),
        MyRewriteRule 
}
trait CustomRules {
  val MyRewriteRule = ...
}
{code}













--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org