You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/06/28 12:23:04 UTC
[jira] [Updated] (SPARK-8645) Incorrect expression analysis with
Hive
[ https://issues.apache.org/jira/browse/SPARK-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-8645:
-----------------------------
Component/s: SQL
Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
Set component for example
> Incorrect expression analysis with Hive
> ---------------------------------------
>
> Key: SPARK-8645
> URL: https://issues.apache.org/jira/browse/SPARK-8645
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.3.0
> Environment: CDH 5.4.2 1.3.0
> Reporter: Eugene Zhulenev
> Labels: dataframe
>
> When using DataFrame backed by Hive table groupBy with agg can't resolve column if I pass them by String and not Column:
> This fails with: org.apache.spark.sql.AnalysisException: expression 'dt' is neither present in the group by, nor is it an aggregate function.
> {code}
> val grouped = eventLogHLL
> .groupBy(dt, ad_id, site_id).agg(
> dt,
> ad_id,
> col(site_id) as site_id,
> sum(imp_count) as imp_count,
> sum(click_count) as click_count
> )
> {code}
> This works fine:
> {code}
> val grouped = eventLogHLL
> .groupBy(col(dt), col(ad_id), col(site_id)).agg(
> col(dt) as dt,
> col(ad_id) as ad_id,
> col(site_id) as site_id,
> sum(imp_count) as imp_count,
> sum(click_count) as click_count
> )
> {code}
> Integration tests running with "embedded" spark and DataFrames generated from RDD works fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org