You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/21 21:36:35 UTC

[GitHub] pgandhi999 edited a comment on issue #23778: [SPARK-24935][SQL] : Problem with Executing Hive UDF's from Spark 2.2 Onwards

pgandhi999 edited a comment on issue #23778: [SPARK-24935][SQL] : Problem with Executing Hive UDF's from Spark 2.2 Onwards
URL: https://github.com/apache/spark/pull/23778#issuecomment-466176136
 
 
   Sure @cloud-fan . Thank you for your response. As far as my understanding of Hive UDAF is concerned, I can roughly classify them into into types: those that support partial aggregation(Mode PARTIAL and FINAL) and those that do not(Mode COMPLETE). For the Hive UDAFs that support partial aggregation, there are five phases:
   - **Initialize:** The aggregation buffers for PARTIAL1 Mode and PARTIAL2 Mode are created in this phase.
   - **Iterate(Update) :** This state processes a new row of data into the aggregation buffer created for PARTIAL1.
   - **TerminatePartial:** Returns the contents of the aggregation buffer.
   - **Merge:** Merges a partial aggregation returned by calling terminatePartial() on PARTIAL1 aggregation buffer into the current aggregation happening on PARTIAL2 aggregation buffer.
   - **Terminate:** Returns the final result of the aggregation stored in PARTIAL2 buffer to Hive.  
   
   For the Hive UDAFs that do not support partial aggregation, I have seen the following three phases:
   -**Initialize:** Initialize the aggregation buffer.
   -**Iterate(Update):** Process the rows into the buffer.
   -**Terminate:** Return the final result.
   
   For more information, you may find this link helpful: https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy
   
   This information is based on what I have found out during my tests and reading through the docs and it is based on this information that I have modeled the behaviour of the class `HiveTypedImperativeAggregate`. I am by no means an expert on Hive, so if you feel that my summary on Hive UDAFs is incorrect or is missing something, please let me know. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org