You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Guilherme Braccialli (JIRA)" <ji...@apache.org> on 2017/09/21 22:09:02 UTC
[jira] [Commented] (SPARK-14236) UDAF does not use incomingSchema
for update Method
[ https://issues.apache.org/jira/browse/SPARK-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175541#comment-16175541 ]
Guilherme Braccialli commented on SPARK-14236:
----------------------------------------------
+1 to implement this.
as a workaround I'm using code below to make code more readable:
val inputColumns = Map(
"start" -> TimestampType,
"end" -> TimestampType
)
override def inputSchema = StructType(inputColumns.map{case (name,dataType) => StructField(name,dataType)}.toArray)
val inputColumnsNameId = inputColumns.zipWithIndex.map{case ((name, dataType), position) => (name -> position)}
val inputStart = inputColumnsNameId("start")
val inputEnd = inputColumnsNameId("end")
PS: I did some tests and identified significant perfomance overhead if I try to resolve field names (by accessing map inputColumnsNameId) inside update function, that's why I created one val with respective id for each input field. I tested with approximate 1 billion rows.
same solution applies to bufferSchema.
> UDAF does not use incomingSchema for update Method
> --------------------------------------------------
>
> Key: SPARK-14236
> URL: https://issues.apache.org/jira/browse/SPARK-14236
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Matthias Niehoff
> Priority: Minor
>
> When I specify a schema for the incoming data in an UDAF, the schema will not be applied to the incoming row in the update method. I can only access the fields using their numeric indices and not with their names. The Fields in the row are named input0, input1,...
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org