You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/07/08 23:20:05 UTC

[jira] [Commented] (SPARK-7549) Support aggregating over nested fields

    [ https://issues.apache.org/jira/browse/SPARK-7549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619387#comment-14619387 ] 

Michael Armbrust commented on SPARK-7549:
-----------------------------------------

This seems underspecified to me.  Given the following:

{code}
val df = Seq((1, Seq(1,2,3)), (2, Seq(3,4,5))).toDF("a", "b")
{code}

What does {{df.select(min($"b"))}} return?  Is it {{1}} or {{1,3}}?  If you want the former, I'd suggest you use {{explode}}.  If you want the later, maybe we should just have UDFs that work for the cases that make sense instead of messing with the aggregation pathway.

> Support aggregating over nested fields
> --------------------------------------
>
>                 Key: SPARK-7549
>                 URL: https://issues.apache.org/jira/browse/SPARK-7549
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>
> Would be nice to be able to run sum, avg, min, max (and other numeric aggregate expressions) on arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org