You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2017/02/10 19:22:54 UTC

benefits of code gen

so i have been looking for a while now at all the catalyst expressions, and
all the relative complex codegen going on.

so first off i get the benefit of codegen to turn a bunch of chained
iterators transformations into a single codegen stage for spark. that makes
sense to me, because it avoids a bunch of overhead.

but what i am not so sure about is what the benefit is of converting the
actual stuff that happens inside the iterator transformations into codegen.

say if we have an expression that has 2 children and creates a struct for
them. why would this be faster in codegen by re-creating the code to do
this in a string (which is complex and error prone) compared to simply have
the codegen call the normal method for this in my class?

i see so much trivial code be re-created in codegen. stuff like this:

  private[this] def castToDateCode(
      from: DataType,
      ctx: CodegenContext): CastFunction = from match {
    case StringType =>
      val intOpt = ctx.freshName("intOpt")
      (c, evPrim, evNull) => s"""
        scala.Option<Integer> $intOpt =
          org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDate($c);
        if ($intOpt.isDefined()) {
          $evPrim = ((Integer) $intOpt.get()).intValue();
        } else {
          $evNull = true;
        }
       """

is this really faster than simply calling an equivalent functions from the
codegen, and keeping the codegen logic restricted to the "unrolling" of
chained iterators?

Re: benefits of code gen

Posted by Koert Kuipers <ko...@tresata.com>.

yes agreed. however i believe nullSafeEval is not used for codegen?

On Fri, Feb 10, 2017 at 4:56 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> Function1 is specialized, but nullSafeEval is Any => Any, so that's still
> going to box in the non-codegened execution path.
>
> On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> based on that i take it that math functions would be primary
>> beneficiaries since they work on primitives.
>>
>> so if i take UnaryMathExpression as an example, would i not get the same
>> benefit if i change it to this?
>>
>> abstract class UnaryMathExpression(val f: Double => Double, name: String)
>>   extends UnaryExpression with Serializable with ImplicitCastInputTypes {
>>
>>   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
>>   override def dataType: DataType = DoubleType
>>   override def nullable: Boolean = true
>>   override def toString: String = s"$name($child)"
>>   override def prettyName: String = name
>>
>>   protected override def nullSafeEval(input: Any): Any = {
>>     f(input.asInstanceOf[Double])
>>   }
>>
>>   // name of function in java.lang.Math
>>   def funcName: String = name.toLowerCase
>>
>>   def function(d: Double): Double = f(d)
>>
>>   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
>>     val self = ctx.addReferenceObj(name, this, getClass.getName)
>>     defineCodeGen(ctx, ev, c => s"$self.function($c)")
>>   }
>> }
>>
>> admittedly in this case the benefit in terms of removing complex codegen
>> is not there (the codegen was only one line), but if i can remove codegen
>> here i could also remove it in lots of other places where it does get very
>> unwieldy simply by replacing it with calls to methods.
>>
>> Function1 is specialized, so i think (or hope) that my version does no
>> extra boxes/unboxing.
>>
>> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> With complex types it doesn't work as well, but for primitive types the
>>> biggest benefit of whole stage codegen is that we don't even need to put
>>> the intermediate data into rows or columns anymore. They are just variables
>>> (stored in CPU registers).
>>>
>>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> so i have been looking for a while now at all the catalyst expressions,
>>>> and all the relative complex codegen going on.
>>>>
>>>> so first off i get the benefit of codegen to turn a bunch of chained
>>>> iterators transformations into a single codegen stage for spark. that makes
>>>> sense to me, because it avoids a bunch of overhead.
>>>>
>>>> but what i am not so sure about is what the benefit is of converting
>>>> the actual stuff that happens inside the iterator transformations into
>>>> codegen.
>>>>
>>>> say if we have an expression that has 2 children and creates a struct
>>>> for them. why would this be faster in codegen by re-creating the code to do
>>>> this in a string (which is complex and error prone) compared to simply have
>>>> the codegen call the normal method for this in my class?
>>>>
>>>> i see so much trivial code be re-created in codegen. stuff like this:
>>>>
>>>>   private[this] def castToDateCode(
>>>>       from: DataType,
>>>>       ctx: CodegenContext): CastFunction = from match {
>>>>     case StringType =>
>>>>       val intOpt = ctx.freshName("intOpt")
>>>>       (c, evPrim, evNull) => s"""
>>>>         scala.Option<Integer> $intOpt =
>>>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>>>> e($c);
>>>>         if ($intOpt.isDefined()) {
>>>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>>>         } else {
>>>>           $evNull = true;
>>>>         }
>>>>        """
>>>>
>>>> is this really faster than simply calling an equivalent functions from
>>>> the codegen, and keeping the codegen logic restricted to the "unrolling" of
>>>> chained iterators?
>>>>
>>>>
>>>
>>
>

Re: benefits of code gen

Posted by Michael Armbrust <mi...@databricks.com>.

Function1 is specialized, but nullSafeEval is Any => Any, so that's still
going to box in the non-codegened execution path.

On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <ko...@tresata.com> wrote:

> based on that i take it that math functions would be primary beneficiaries
> since they work on primitives.
>
> so if i take UnaryMathExpression as an example, would i not get the same
> benefit if i change it to this?
>
> abstract class UnaryMathExpression(val f: Double => Double, name: String)
>   extends UnaryExpression with Serializable with ImplicitCastInputTypes {
>
>   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
>   override def dataType: DataType = DoubleType
>   override def nullable: Boolean = true
>   override def toString: String = s"$name($child)"
>   override def prettyName: String = name
>
>   protected override def nullSafeEval(input: Any): Any = {
>     f(input.asInstanceOf[Double])
>   }
>
>   // name of function in java.lang.Math
>   def funcName: String = name.toLowerCase
>
>   def function(d: Double): Double = f(d)
>
>   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
>     val self = ctx.addReferenceObj(name, this, getClass.getName)
>     defineCodeGen(ctx, ev, c => s"$self.function($c)")
>   }
> }
>
> admittedly in this case the benefit in terms of removing complex codegen
> is not there (the codegen was only one line), but if i can remove codegen
> here i could also remove it in lots of other places where it does get very
> unwieldy simply by replacing it with calls to methods.
>
> Function1 is specialized, so i think (or hope) that my version does no
> extra boxes/unboxing.
>
> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> With complex types it doesn't work as well, but for primitive types the
>> biggest benefit of whole stage codegen is that we don't even need to put
>> the intermediate data into rows or columns anymore. They are just variables
>> (stored in CPU registers).
>>
>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> so i have been looking for a while now at all the catalyst expressions,
>>> and all the relative complex codegen going on.
>>>
>>> so first off i get the benefit of codegen to turn a bunch of chained
>>> iterators transformations into a single codegen stage for spark. that makes
>>> sense to me, because it avoids a bunch of overhead.
>>>
>>> but what i am not so sure about is what the benefit is of converting the
>>> actual stuff that happens inside the iterator transformations into codegen.
>>>
>>> say if we have an expression that has 2 children and creates a struct
>>> for them. why would this be faster in codegen by re-creating the code to do
>>> this in a string (which is complex and error prone) compared to simply have
>>> the codegen call the normal method for this in my class?
>>>
>>> i see so much trivial code be re-created in codegen. stuff like this:
>>>
>>>   private[this] def castToDateCode(
>>>       from: DataType,
>>>       ctx: CodegenContext): CastFunction = from match {
>>>     case StringType =>
>>>       val intOpt = ctx.freshName("intOpt")
>>>       (c, evPrim, evNull) => s"""
>>>         scala.Option<Integer> $intOpt =
>>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>>> e($c);
>>>         if ($intOpt.isDefined()) {
>>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>>         } else {
>>>           $evNull = true;
>>>         }
>>>        """
>>>
>>> is this really faster than simply calling an equivalent functions from
>>> the codegen, and keeping the codegen logic restricted to the "unrolling" of
>>> chained iterators?
>>>
>>>
>>
>

Re: benefits of code gen

Posted by Koert Kuipers <ko...@tresata.com>.

thanks for that detailed response!

On Mon, Feb 13, 2017 at 12:56 AM, Sumedh Wale <sw...@snappydata.io> wrote:

> The difference is closure invocation instead of a static java.lang.Math
> call. In many cases JIT may not be able to perform inlining and related
> code optimizations though in this specific case it should. This is highly
> dependent on the specific case, but when inlining cannot be done and it
> leads to a method call (especially virtual call) then the difference is
> quite large: few nanoseconds per evaluation vs tens of nanoseconds in my
> experiments.
> Serialization of an additional object as a reference can have a measurable
> effect for low-latency jobs though usually can be ignored.
>
> What has been observed is that if an expression uses CodegenFallback then
> it becomes an order of magnitude slower or more. Most of it is due to
> UnsafeRow read/write overhead which is avoided here, but still care needs
> to be taken for (virtual) function calls too. In some cases JIT does inline
> virtual calls but may not always happen. In my experience the only reliable
> case where it does inline is when the virtual call is on a local variable
> that does not change for multiple invocations (e.g. a final local variable
> outside the while loop of a doProduce).
>
> I think what should work better is encapsulating such code in methods of a
> scala object rather than a class and those can be invoked in generated code
> like static methods. Such calls should be equivalent to inline code
> generation in most cases since JIT will inline the calls where it will
> determine significant benefit. In some cases such method calls will have
> better CPU instruction cache hits (i.e. if same inline code is emitted
> multiple times vs common method calls). All this needs thorough
> micro/macro-benchmarking.
>
> However, I don't recall any large pieces of generated code where this can
> help. Most complex pieces like in HashAggregateExec/SortMergeJoinExec/BroadcastHashJoinExec
> are so because they generate schema specific code (to avoid virtual calls
> and boxing/unboxing, and UnsafeRow read/write in some cases) which is
> significantly faster than the equivalent generic code in doExecute. Or in
> your "castToDateCode" example, don't see how you can reduce it since bulk
> of code is already in the static stringToDate call.
>
>
>
> On Saturday 11 February 2017 03:02 AM, Koert Kuipers wrote:
>
> based on that i take it that math functions would be primary beneficiaries
> since they work on primitives.
>
> so if i take UnaryMathExpression as an example, would i not get the same
> benefit if i change it to this?
>
> abstract class UnaryMathExpression(val f: Double => Double, name: String)
>   extends UnaryExpression with Serializable with ImplicitCastInputTypes {
>
>   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
>   override def dataType: DataType = DoubleType
>   override def nullable: Boolean = true
>   override def toString: String = s"$name($child)"
>   override def prettyName: String = name
>
>   protected override def nullSafeEval(input: Any): Any = {
>     f(input.asInstanceOf[Double])
>   }
>
>   // name of function in java.lang.Math
>   def funcName: String = name.toLowerCase
>
>   def function(d: Double): Double = f(d)
>
>   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
>     val self = ctx.addReferenceObj(name, this, getClass.getName)
>     defineCodeGen(ctx, ev, c => s"$self.function($c)")
>   }
> }
>
> admittedly in this case the benefit in terms of removing complex codegen
> is not there (the codegen was only one line), but if i can remove codegen
> here i could also remove it in lots of other places where it does get very
> unwieldy simply by replacing it with calls to methods.
>
> Function1 is specialized, so i think (or hope) that my version does no
> extra boxes/unboxing.
>
> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> With complex types it doesn't work as well, but for primitive types the
>> biggest benefit of whole stage codegen is that we don't even need to put
>> the intermediate data into rows or columns anymore. They are just variables
>> (stored in CPU registers).
>>
>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> so i have been looking for a while now at all the catalyst expressions,
>>> and all the relative complex codegen going on.
>>>
>>> so first off i get the benefit of codegen to turn a bunch of chained
>>> iterators transformations into a single codegen stage for spark. that makes
>>> sense to me, because it avoids a bunch of overhead.
>>>
>>> but what i am not so sure about is what the benefit is of converting the
>>> actual stuff that happens inside the iterator transformations into codegen.
>>>
>>> say if we have an expression that has 2 children and creates a struct
>>> for them. why would this be faster in codegen by re-creating the code to do
>>> this in a string (which is complex and error prone) compared to simply have
>>> the codegen call the normal method for this in my class?
>>>
>>> i see so much trivial code be re-created in codegen. stuff like this:
>>>
>>>   private[this] def castToDateCode(
>>>       from: DataType,
>>>       ctx: CodegenContext): CastFunction = from match {
>>>     case StringType =>
>>>       val intOpt = ctx.freshName("intOpt")
>>>       (c, evPrim, evNull) => s"""
>>>         scala.Option<Integer> $intOpt =
>>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>>> e($c);
>>>         if ($intOpt.isDefined()) {
>>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>>         } else {
>>>           $evNull = true;
>>>         }
>>>        """
>>>
>>> is this really faster than simply calling an equivalent functions from
>>> the codegen, and keeping the codegen logic restricted to the "unrolling" of
>>> chained iterators?
>>>
>>>
>>
>
>

Re: benefits of code gen

Posted by Koert Kuipers <ko...@tresata.com>.

based on that i take it that math functions would be primary beneficiaries
since they work on primitives.

so if i take UnaryMathExpression as an example, would i not get the same
benefit if i change it to this?

abstract class UnaryMathExpression(val f: Double => Double, name: String)
  extends UnaryExpression with Serializable with ImplicitCastInputTypes {

  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
  override def dataType: DataType = DoubleType
  override def nullable: Boolean = true
  override def toString: String = s"$name($child)"
  override def prettyName: String = name

  protected override def nullSafeEval(input: Any): Any = {
    f(input.asInstanceOf[Double])
  }

  // name of function in java.lang.Math
  def funcName: String = name.toLowerCase

  def function(d: Double): Double = f(d)

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
    val self = ctx.addReferenceObj(name, this, getClass.getName)
    defineCodeGen(ctx, ev, c => s"$self.function($c)")
  }
}

admittedly in this case the benefit in terms of removing complex codegen is
not there (the codegen was only one line), but if i can remove codegen here
i could also remove it in lots of other places where it does get very
unwieldy simply by replacing it with calls to methods.

Function1 is specialized, so i think (or hope) that my version does no
extra boxes/unboxing.

On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <rx...@databricks.com> wrote:

> With complex types it doesn't work as well, but for primitive types the
> biggest benefit of whole stage codegen is that we don't even need to put
> the intermediate data into rows or columns anymore. They are just variables
> (stored in CPU registers).
>
> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> so i have been looking for a while now at all the catalyst expressions,
>> and all the relative complex codegen going on.
>>
>> so first off i get the benefit of codegen to turn a bunch of chained
>> iterators transformations into a single codegen stage for spark. that makes
>> sense to me, because it avoids a bunch of overhead.
>>
>> but what i am not so sure about is what the benefit is of converting the
>> actual stuff that happens inside the iterator transformations into codegen.
>>
>> say if we have an expression that has 2 children and creates a struct for
>> them. why would this be faster in codegen by re-creating the code to do
>> this in a string (which is complex and error prone) compared to simply have
>> the codegen call the normal method for this in my class?
>>
>> i see so much trivial code be re-created in codegen. stuff like this:
>>
>>   private[this] def castToDateCode(
>>       from: DataType,
>>       ctx: CodegenContext): CastFunction = from match {
>>     case StringType =>
>>       val intOpt = ctx.freshName("intOpt")
>>       (c, evPrim, evNull) => s"""
>>         scala.Option<Integer> $intOpt =
>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>> e($c);
>>         if ($intOpt.isDefined()) {
>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>         } else {
>>           $evNull = true;
>>         }
>>        """
>>
>> is this really faster than simply calling an equivalent functions from
>> the codegen, and keeping the codegen logic restricted to the "unrolling" of
>> chained iterators?
>>
>>
>

Re: benefits of code gen

Posted by Reynold Xin <rx...@databricks.com>.

With complex types it doesn't work as well, but for primitive types the
biggest benefit of whole stage codegen is that we don't even need to put
the intermediate data into rows or columns anymore. They are just variables
(stored in CPU registers).

On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <ko...@tresata.com> wrote:

> so i have been looking for a while now at all the catalyst expressions,
> and all the relative complex codegen going on.
>
> so first off i get the benefit of codegen to turn a bunch of chained
> iterators transformations into a single codegen stage for spark. that makes
> sense to me, because it avoids a bunch of overhead.
>
> but what i am not so sure about is what the benefit is of converting the
> actual stuff that happens inside the iterator transformations into codegen.
>
> say if we have an expression that has 2 children and creates a struct for
> them. why would this be faster in codegen by re-creating the code to do
> this in a string (which is complex and error prone) compared to simply have
> the codegen call the normal method for this in my class?
>
> i see so much trivial code be re-created in codegen. stuff like this:
>
>   private[this] def castToDateCode(
>       from: DataType,
>       ctx: CodegenContext): CastFunction = from match {
>     case StringType =>
>       val intOpt = ctx.freshName("intOpt")
>       (c, evPrim, evNull) => s"""
>         scala.Option<Integer> $intOpt =
>           org.apache.spark.sql.catalyst.util.DateTimeUtils.
> stringToDate($c);
>         if ($intOpt.isDefined()) {
>           $evPrim = ((Integer) $intOpt.get()).intValue();
>         } else {
>           $evNull = true;
>         }
>        """
>
> is this really faster than simply calling an equivalent functions from the
> codegen, and keeping the codegen logic restricted to the "unrolling" of
> chained iterators?
>
>