You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eugene Zhulenev (JIRA)" <ji...@apache.org> on 2015/06/25 23:37:04 UTC

[jira] [Created] (SPARK-8645) Incorrect expression analysis with Hive

Eugene Zhulenev created SPARK-8645:
--------------------------------------

             Summary: Incorrect expression analysis with Hive
                 Key: SPARK-8645
                 URL: https://issues.apache.org/jira/browse/SPARK-8645
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.3.0
         Environment: CDH 5.4.2 1.3.0
            Reporter: Eugene Zhulenev


When using DataFrame backed by Hive table groupBy with agg can't resolve column if I pass them by String and not Column:

This fails with: org.apache.spark.sql.AnalysisException: expression 'dt' is neither present in the group by, nor is it an aggregate function.

{code}
val grouped = eventLogHLL
      .groupBy(dt, ad_id, site_id).agg(
        dt,
        ad_id,
        col(site_id)             as site_id,
        sum(imp_count)           as imp_count,
        sum(click_count)         as click_count
      )
{code}

This works fine:
{code}
  val grouped = eventLogHLL
      .groupBy(col(dt), col(ad_id), col(site_id)).agg(
        col(dt)                        as dt,
        col(ad_id)                     as ad_id,
        col(site_id)                   as site_id,
        sum(imp_count)                 as imp_count,
        sum(click_count)               as click_count
      )
{code}

Integration tests running with "embedded" spark and DataFrames generated from RDD works fine.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org