You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by marc nicole <mk...@gmail.com> on 2022/04/19 11:27:45 UTC

Grouping and counting occurences of specific column rows

Hello guys,

I want to group by certain column attributes (e.g.,List<Column >
groupByQidAttributes) a dataset (initDataset) and then count the
occurrences of associated grouped rows, how do i achieve that neatly?
I tried through the following code:
Dataset<Row> groupedRowsDF = initDataset.withColumn("qidsFreqs", count("*").
over(Window.partitionBy(groupByQidAttributes.toArray(new Column[
groupByQidAttributes.size()])))); Is that OK to use for the purpose?

Re: Grouping and counting occurences of specific column rows

Posted by marc nicole <mk...@gmail.com>.
I don't want to groupBy since i want the rows separate for the subsequent
transformations. But i want to groupBy (i am using partitionBy here) using
many attributes while counting the frequency for each different group of
records (with respect to the the attributes first mentioned)

Le mar. 19 avr. 2022 à 14:06, Sean Owen <sr...@gmail.com> a écrit :

> Just .groupBy(...).count() ?
>
> On Tue, Apr 19, 2022 at 6:24 AM marc nicole <mk...@gmail.com> wrote:
>
>> Hello guys,
>>
>> I want to group by certain column attributes (e.g.,List<Column >
>> groupByQidAttributes) a dataset (initDataset) and then count the
>> occurrences of associated grouped rows, how do i achieve that neatly?
>> I tried through the following code:
>> Dataset<Row> groupedRowsDF = initDataset.withColumn("qidsFreqs", count(
>> "*").over(Window.partitionBy(groupByQidAttributes.toArray(new Column[
>> groupByQidAttributes.size()])))); Is that OK to use for the purpose?
>>
>>

Re: Grouping and counting occurences of specific column rows

Posted by Sean Owen <sr...@gmail.com>.
Just .groupBy(...).count() ?

On Tue, Apr 19, 2022 at 6:24 AM marc nicole <mk...@gmail.com> wrote:

> Hello guys,
>
> I want to group by certain column attributes (e.g.,List<Column >
> groupByQidAttributes) a dataset (initDataset) and then count the
> occurrences of associated grouped rows, how do i achieve that neatly?
> I tried through the following code:
> Dataset<Row> groupedRowsDF = initDataset.withColumn("qidsFreqs", count("*"
> ).over(Window.partitionBy(groupByQidAttributes.toArray(new Column[
> groupByQidAttributes.size()])))); Is that OK to use for the purpose?
>
>