You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/03/16 13:08:41 UTC
[jira] [Commented] (SPARK-19519) Groupby for multiple columns not working

    [ https://issues.apache.org/jira/browse/SPARK-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927979#comment-15927979 ] 

Hyukjin Kwon commented on SPARK-19519:
--------------------------------------

Do you mind if I ask self-reproducer? It seems the provided details are pretty much dependent on the original data. I am willing to help and verify.

BTW, it does not look like a {{Blocker}}. 

> Groupby for multiple columns not working
> ----------------------------------------
>
>                 Key: SPARK-19519
>                 URL: https://issues.apache.org/jira/browse/SPARK-19519
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.5.0
>            Reporter: Faisal
>            Priority: Blocker
>
> Please look at the below join between multiple dataframes, then while applying  groupby function for the multiple columns for the aggregate max does not yield results instead exception User class threw exception: org.apache.spark.sql.AnalysisException: expression 'propVal' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.
>  DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
>         		.join(moduleCodeDf.as("mc"), moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
>         		.join(dictDfCharCode.as("dc"), dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
>         		.join(dictDfIsAChar, dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")));
>           		
>         joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
>         		col("dc.propVal").as("mcaCtypeCode"),
>         		max(col("mod.updatedDate")).as("mcaLastChangedDate"),
>         		coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
>                          max(when(col("mndtryInd").equalTo("N"), "N")),
>                          max(col("mndtryInd"))).as("mcaMandatoryFlg"),
>         		 lit("N").as("mcaLockedFlg"),
>         		 coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
>                          max(when(col("fldColInd").equalTo("N"), "I")),max(col("fldColInd"))).as("mcaFieldCollectionFlg"))
> .groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org