You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Faisal (JIRA)" <ji...@apache.org> on 2017/02/08 21:35:41 UTC
[jira] [Created] (SPARK-19519) Groupby for multiple columns not
working
Faisal created SPARK-19519:
------------------------------
Summary: Groupby for multiple columns not working
Key: SPARK-19519
URL: https://issues.apache.org/jira/browse/SPARK-19519
Project: Spark
Issue Type: Bug
Components: Java API
Affects Versions: 1.5.0
Reporter: Faisal
Priority: Blocker
DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
.join(moduleCodeDf.as("mc"), moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
.join(dictDfCharCode.as("dc"), dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
.join(dictDfIsAChar, dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
;
joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
col("dc.propVal").as("mcaCtypeCode"),
max(col("mod.updatedDate")).as("mcaLastChangedDate"),
coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
max(when(col("mndtryInd").equalTo("N"), "N")),
max(col("mndtryInd"))).as("mcaMandatoryFlg"),
lit("N").as("mcaLockedFlg"),
coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
max(when(col("fldColInd").equalTo("N"), "I")),
max(col("fldColInd"))).as("mcaFieldCollectionFlg")
).groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));
Throws below exception
User class threw exception: org.apache.spark.sql.AnalysisException: expression 'propVal' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org