You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/07/08 23:20:05 UTC
[jira] [Commented] (SPARK-7549) Support aggregating over nested
fields
[ https://issues.apache.org/jira/browse/SPARK-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619387#comment-14619387 ]
Michael Armbrust commented on SPARK-7549:
-----------------------------------------
This seems underspecified to me. Given the following:
{code}
val df = Seq((1, Seq(1,2,3)), (2, Seq(3,4,5))).toDF("a", "b")
{code}
What does {{df.select(min($"b"))}} return? Is it {{1}} or {{1,3}}? If you want the former, I'd suggest you use {{explode}}. If you want the later, maybe we should just have UDFs that work for the cases that make sense instead of messing with the aggregation pathway.
> Support aggregating over nested fields
> --------------------------------------
>
> Key: SPARK-7549
> URL: https://issues.apache.org/jira/browse/SPARK-7549
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Reynold Xin
>
> Would be nice to be able to run sum, avg, min, max (and other numeric aggregate expressions) on arrays.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org