You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Manoj Samel <ma...@gmail.com> on 2014/03/31 06:52:11 UTC

groupBy RDD does not have grouping column ?

Hi,

If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the
resulting RDD should have 'a, 'foo and 'bar.

The result RDD just shows 'foo and 'bar and is missing 'a

Thoughts?

Thanks,

Manoj

Re: groupBy RDD does not have grouping column ?

Posted by Manoj Samel <ma...@gmail.com>.
Thanks, that works.

It wasn't clear if the second part is just the aggregate specification or
any expression.


On Mon, Mar 31, 2014 at 9:03 AM, Michael Armbrust <mi...@databricks.com>wrote:

> This is similar to how SQL works, items in the GROUP BY clause are not
> included in the output by default.  You will need to include 'a in the
> second parameter list (which is similar to the SELECT clause) as well if
> you want it included in the output.
>
>
> On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel <ma...@gmail.com>wrote:
>
>> Hi,
>>
>> If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the
>> resulting RDD should have 'a, 'foo and 'bar.
>>
>> The result RDD just shows 'foo and 'bar and is missing 'a
>>
>> Thoughts?
>>
>> Thanks,
>>
>> Manoj
>>
>
>

Re: groupBy RDD does not have grouping column ?

Posted by Michael Armbrust <mi...@databricks.com>.
This is similar to how SQL works, items in the GROUP BY clause are not
included in the output by default.  You will need to include 'a in the
second parameter list (which is similar to the SELECT clause) as well if
you want it included in the output.


On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel <ma...@gmail.com>wrote:

> Hi,
>
> If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the
> resulting RDD should have 'a, 'foo and 'bar.
>
> The result RDD just shows 'foo and 'bar and is missing 'a
>
> Thoughts?
>
> Thanks,
>
> Manoj
>