You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Tejas Patil <te...@gmail.com> on 2016/10/12 17:26:37 UTC

`Project` not preserving child partitioning ?

See
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L80

Project operator preserves child's sort ordering but for output
partitioning, it does not. I don't see any way projection would alter the
partitioning of the child plan because rows are not passed across
partitions when project happens (and if it does then it would also affect
the sort ordering won't it ?). Am I missing something obvious here ?

Thanks,
Tejas

Re: `Project` not preserving child partitioning ?

Posted by Tejas Patil <te...@gmail.com>.
Sure :)

Thanks,
Tejas

On Wed, Oct 12, 2016 at 11:26 AM, Reynold Xin <rx...@databricks.com> wrote:

> It actually does -- but do it through a really weird way.
>
> UnaryNodeExec actually defines:
>
> trait UnaryExecNode extends SparkPlan {
>   def child: SparkPlan
>
>   override final def children: Seq[SparkPlan] = child :: Nil
>
>   override def outputPartitioning: Partitioning = child.outputPartitioning
> }
>
>
> I think this is very risky because preserving output partitioning should
> not be a property of UnaryNodeExec (e.g. exchange). It would be better
> (safer) to move the output partitioning definition into each of the
> operator and remove it from UnaryExecNode.
>
> Would you be interested in submitting the patch?
>
>
>
> On Wed, Oct 12, 2016 at 10:26 AM, Tejas Patil <te...@gmail.com>
> wrote:
>
>> See https://github.com/apache/spark/blob/master/sql/core/src
>> /main/scala/org/apache/spark/sql/execution/basicPhysicalOpe
>> rators.scala#L80
>>
>> Project operator preserves child's sort ordering but for output
>> partitioning, it does not. I don't see any way projection would alter the
>> partitioning of the child plan because rows are not passed across
>> partitions when project happens (and if it does then it would also affect
>> the sort ordering won't it ?). Am I missing something obvious here ?
>>
>> Thanks,
>> Tejas
>>
>
>

Re: `Project` not preserving child partitioning ?

Posted by Reynold Xin <rx...@databricks.com>.
It actually does -- but do it through a really weird way.

UnaryNodeExec actually defines:

trait UnaryExecNode extends SparkPlan {
  def child: SparkPlan

  override final def children: Seq[SparkPlan] = child :: Nil

  override def outputPartitioning: Partitioning = child.outputPartitioning
}


I think this is very risky because preserving output partitioning should
not be a property of UnaryNodeExec (e.g. exchange). It would be better
(safer) to move the output partitioning definition into each of the
operator and remove it from UnaryExecNode.

Would you be interested in submitting the patch?



On Wed, Oct 12, 2016 at 10:26 AM, Tejas Patil <te...@gmail.com>
wrote:

> See https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/
> basicPhysicalOperators.scala#L80
>
> Project operator preserves child's sort ordering but for output
> partitioning, it does not. I don't see any way projection would alter the
> partitioning of the child plan because rows are not passed across
> partitions when project happens (and if it does then it would also affect
> the sort ordering won't it ?). Am I missing something obvious here ?
>
> Thanks,
> Tejas
>