You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "James Aley (JIRA)" <ji...@apache.org> on 2015/07/29 14:28:04 UTC

[jira] [Created] (SPARK-9435) Java UDFs don't work with GROUP BY expressions

James Aley created SPARK-9435:
---------------------------------

             Summary: Java UDFs don't work with GROUP BY expressions
                 Key: SPARK-9435
                 URL: https://issues.apache.org/jira/browse/SPARK-9435
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.4.1
         Environment: All
            Reporter: James Aley
         Attachments: IncMain.java, points.txt

If you define a UDF in Java, for example by implementing the UDF1 interface, then try to use that UDF on a column in both the SELECT and GROUP BY clauses of a query, you'll get an error like this:

{code}

"SELECT inc(y),COUNT(DISTINCT x) FROM test_table GROUP BY inc(y)"

org.apache.spark.sql.AnalysisException: expression 'y' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.
{code}

We put together a minimal reproduction in the attached Java file, which makes use of the data in the text file attached.

I'm guessing there's some kind of issue with the equality implementation, so Spark can't tell that those two expressions are the same maybe? If you do the same thing from Scala, it works fine.

Note for context: we ran into this issue while working around SPARK-9338.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org