You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Ted Chester Jenks <te...@palantir.com.INVALID> on 2022/11/30 12:27:16 UTC

Custom resolution rules that grow query plans

Hello,

I wish to write a custom logical plan rule that modifies the output schema and grows the logical plan. The purpose of the rule is roughly to apply a projection on top of DatasourceV2Relation depending on some condition:


case class MyRule extends Rule[LogicalPlan] {
  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case relation: DataSourceV2Relation if someCondition(relation) =>
      Project(getExpressions(relation), relation)
  }
}


We add this rule to extendedResolutionRules understanding it can’t be a postHocResolutionRule or optimizer rule because it modifies the output scheme and the Project needs to be resolved.

As extendedResolutionRule it runs in the fixed-point “Resolution” batch, meaning the batch keeps running indefinitely as the rule grows the query plan on every iteration.

It is possible to avoid the batch running indefinitely by adding to the relation options or node tags to mark the node was processed. This feels a little hacky. Is there functionality in Spark I am missing that can achieve the desired behavior without resorting to this?

I imagine there may be a rule in spark that deals with this but I could not find it.

If this is not covered, I can draft a contribution to cover this case.

Cheers,
Ted