You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Guilherme Braccialli (JIRA)" <ji...@apache.org> on 2017/09/21 22:09:02 UTC

[jira] [Commented] (SPARK-14236) UDAF does not use incomingSchema for update Method

    [ https://issues.apache.org/jira/browse/SPARK-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175541#comment-16175541 ] 

Guilherme Braccialli commented on SPARK-14236:
----------------------------------------------

+1 to implement this.

as a workaround I'm using code below to make code more readable:
  val inputColumns = Map(
          "start" -> TimestampType, 
          "end" -> TimestampType
  )
  override def inputSchema = StructType(inputColumns.map{case (name,dataType) => StructField(name,dataType)}.toArray)
  val inputColumnsNameId = inputColumns.zipWithIndex.map{case ((name, dataType), position) => (name -> position)}
  val inputStart = inputColumnsNameId("start")
  val inputEnd = inputColumnsNameId("end")

PS: I did some tests and identified significant perfomance overhead if I try to resolve field names (by accessing map inputColumnsNameId) inside update function, that's why I created one val with respective id for each input field. I tested with approximate 1 billion rows.

same solution applies to bufferSchema.

> UDAF does not use incomingSchema for update Method
> --------------------------------------------------
>
>                 Key: SPARK-14236
>                 URL: https://issues.apache.org/jira/browse/SPARK-14236
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Matthias Niehoff
>            Priority: Minor
>
> When I specify a schema for the incoming data in an UDAF, the schema will not be applied to the incoming row in the update method. I can only access the fields using their numeric indices and not with their names. The Fields in the row are named input0, input1,...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org