You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by nsengupta <se...@gmail.com> on 2016/05/01 12:48:44 UTC

Re: Count of Grouped DataSet

Hello all,

This is how I have moved ahead with the implementation of finding count of a
GroupedDataSet:

*val k = envDefault
      .fromElements((1,1,2,"A"),(1,1,2,"B"),(2,1,3,"B"),(3,1,4,"C"))
      .groupBy(1,2)
      .reduceGroup(nextGroup => {
            val asList = nextGroup.toList
        (asList.head._2,asList.head._3,asList.size)
      })

    k.print()*

While this produces the expected output alright, I am not sure if this the
ideal, idiomatic way to implement what I need. Could you please confirm? If
there is a better way, I would like to be wiser of course.

-- Nirmalya



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Count-of-Grouped-DataSet-tp6592p6594.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Count of Grouped DataSet

Posted by nsengupta <se...@gmail.com>.
Hello Fabian,

Thanks for taking time to provide your recommendation, This is how I have
implemented:

case class Something(f1: Int,f2: Int,f3: Int,f4: String ) // My
application's data structure

    *val k = envDefault
     
.fromElements(Something(1,1,2,"A"),Something(1,1,2,"B"),Something(2,1,3,"A"),Something(3,1,4,"C"))
      .map(e => (e.f1, e.f2, e.f3, e.f4,1))  // I create a temporary tuple
      .groupBy(1,2)
      .sum(4)
      .map(e => (Something(e._1,e._2,e._3,e._4),e._5))
      .print*

The output is 
*(Something(2,1,3,A),1)
(Something(1,1,2,B),2)
(Something(3,1,4,C),1)*

I need to create a temporary tuple, because I need group by fields of the
case class; yet, I need to sum the fifth (newly added) field. Somehow, I
feel this is clunky! 

Is this a preferred way? Is there a better (performant, yet idiomatic) way?
Please make me wiser.

-- Nirmalya



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Count-of-Grouped-DataSet-tp6592p6623.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Count of Grouped DataSet

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Nirmalya,

the solution with List.size() won't use a combiner and won't be efficient
for large data sets with large groups.
I would recommend to add a 1 and use GroupedDataSet.sum().

2016-05-01 12:48 GMT+02:00 nsengupta <se...@gmail.com>:

> Hello all,
>
> This is how I have moved ahead with the implementation of finding count of
> a
> GroupedDataSet:
>
> *val k = envDefault
>       .fromElements((1,1,2,"A"),(1,1,2,"B"),(2,1,3,"B"),(3,1,4,"C"))
>       .groupBy(1,2)
>       .reduceGroup(nextGroup => {
>             val asList = nextGroup.toList
>         (asList.head._2,asList.head._3,asList.size)
>       })
>
>     k.print()*
>
> While this produces the expected output alright, I am not sure if this the
> ideal, idiomatic way to implement what I need. Could you please confirm? If
> there is a better way, I would like to be wiser of course.
>
> -- Nirmalya
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Count-of-Grouped-DataSet-tp6592p6594.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>