You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marco Mans <ma...@telemans.de> on 2017/02/20 11:23:50 UTC

Basic Grouping Question

Hi!

I'm new to Spark and trying to write my first spark job on some data I have.
The data is in this (parquet) format:

Code,timestamp, value
A, 2017-01-01, 123
A, 2017-01-02, 124
A, 2017-01-03, 126
B, 2017-01-01, 127
B, 2017-01-02, 126
B, 2017-01-03, 123

I want to write a little map-reduce application that must be run on each
'code'.
So I would need to group the data on the 'code' column and than execute the
map and the reduce steps on each code; 2 times in this example, A and B.

But when I group the data (groupBy-function), it returns a
RelationalDatasetGroup. On this I cannot apply the map and reduce function.

I have the feeling that I am running in the wrong direction. Does anyone
know how to approach this? (I hope I explained it right, so it can be
understand :))

Regards,
Marco

Re: Basic Grouping Question

Posted by ayan guha <gu...@gmail.com>.
Hi

Once you specify the aggregates on group By function (I am assuming you
mean dataframe here?), grouping and aggregate both works in distributed
fashion (you may want to look into how reduceByKey and/or aggregateBykey
work).

On Mon, Feb 20, 2017 at 10:23 PM, Marco Mans <ma...@telemans.de> wrote:

> Hi!
>
> I'm new to Spark and trying to write my first spark job on some data I
> have.
> The data is in this (parquet) format:
>
> Code,timestamp, value
> A, 2017-01-01, 123
> A, 2017-01-02, 124
> A, 2017-01-03, 126
> B, 2017-01-01, 127
> B, 2017-01-02, 126
> B, 2017-01-03, 123
>
> I want to write a little map-reduce application that must be run on each
> 'code'.
> So I would need to group the data on the 'code' column and than execute
> the map and the reduce steps on each code; 2 times in this example, A and B.
>
> But when I group the data (groupBy-function), it returns a
> RelationalDatasetGroup. On this I cannot apply the map and reduce function.
>
> I have the feeling that I am running in the wrong direction. Does anyone
> know how to approach this? (I hope I explained it right, so it can be
> understand :))
>
> Regards,
> Marco
>



-- 
Best Regards,
Ayan Guha