You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Parth Gandhi (JIRA)" <ji...@apache.org> on 2018/07/26 14:00:03 UTC

[jira] [Created] (SPARK-24935) Problem with Executing Hive UDF's from Spark 2.2 Onwards

Parth Gandhi created SPARK-24935:
------------------------------------

             Summary: Problem with Executing Hive UDF's from Spark 2.2 Onwards
                 Key: SPARK-24935
                 URL: https://issues.apache.org/jira/browse/SPARK-24935
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.1, 2.2.0
            Reporter: Parth Gandhi


A user of sketches library(https://github.com/DataSketches/sketches-hive) reported an issue with HLL Sketch Hive UDAF that seems to be a bug in Spark or Hive. Their code runs fine in 2.1 but has an issue from 2.2 onwards. For more details on the issue, you can refer to the discussion in the sketches-user list:
[https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ]

 

On further debugging, we figured out that from 2.2 onwards, Spark hive UDAF provides support for partial aggregation, and has removed the functionality that supported complete mode aggregation(Refer https://issues.apache.org/jira/browse/SPARK-19060 and https://issues.apache.org/jira/browse/SPARK-18186). Thus, instead of expecting update method to be called, merge method is called here ([https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56)] which throws the exception as described in the forums above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org