You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by AssafMendelson <as...@rsa.com> on 2016/09/04 11:09:39 UTC
Creating a UDF/UDAF using code generation
Hi,
I want to write a UDF/UDAF which provides native processing performance. Currently, when creating a UDF/UDAF in a normal manner the performance is hit because it breaks optimizations.
I tried something like this:
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.util.TypeUtils
import org.apache.spark.sql.types._
import org.apache.spark.util.Utils
import org.apache.spark.sql.catalyst.expressions._
case class genf(child: Expression) extends UnaryExpression with Predicate with ImplicitCastInputTypes {
override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType)
override def toString: String = s"$child < 10"
override def eval(input: InternalRow): Any = {
val value = child.eval(input)
if (value == null)
{
false
} else {
child.dataType match {
case IntegerType => value.asInstanceOf[Int] < 10
}
}
}
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
defineCodeGen(ctx, ev, c => s"($c) < 10")
}
}
However, this doesn't work as some of the underlying classes/traits are private (e.g. AbstractDataType is private) making it problematic to create a new case class.
Is there a way to do it? The idea is to provide a couple of jars with a bunch of functions our team needs.
Thanks,
Assaf.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Creating-a-UDF-UDAF-using-code-generation-tp27652.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.