You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by goi cto <go...@gmail.com> on 2014/03/13 17:56:44 UTC

How to work with ReduceByKey?

Hi,

I have an RDD with <S,Tuple2<I,List>> which I want to reduceByKey and get
I+I and List of List
(add the integers and build a list of the lists.

BUT reduce by key requires that the return value is of the same type of the
input
so I can combine the lists.

JavaPairRDD<String,Tuple2<Integer*,List<*List<String>>>> callCount =
byCaller.*reduceByKey*(
new
Function2<Tuple2<Integer,List<String>>,Tuple2<Integer,List<String>>,Tuple2<Integer,List<List<String>>>>(){
public Tuple2<Integer,List<List<String>>> call(Tuple2<Integer,List<String>>
i1,Tuple2<Integer,List<String>> i2){
Integer count = i1._1+i2._1;
List<List<String>> combinedList = new ArrayList<List<String>>(2);
combinedList.add(i1._2);
combinedList.add(i2._2);
return new Tuple2(count,combinedList);
}


any solution for that?

-- 
Eran | CTO

Re: How to work with ReduceByKey?

Posted by Shixiong Zhu <zs...@gmail.com>.
Hi,

You can use "groupByKey + mapValues", e.g.,

JavaPairRDD<String, Tuple2<Integer, List<List<String>>>> callCount =
byCaller
.groupByKey()
.mapValues(
  new Function<List<Tuple2<Integer, List<String>>>, Tuple2<Integer,
List<List<String>>>>() {

   @Override
   public Tuple2<Integer, List<List<String>>> call(
 List<Tuple2<Integer, List<String>>> values)
 throws Exception {
int count = 0;
List<List<String>> l = new ArrayList<List<String>>();
for (Tuple2<Integer, List<String>> value : values) {
 count += value._1;
 l.add(value._2);
}
return new Tuple2<Integer, List<List<String>>>(
  count, l);
   }

  });

Or "combineByKey" which often has better performance.


Best Regards,
Shixiong Zhu


2014-03-14 0:56 GMT+08:00 goi cto <go...@gmail.com>:

> Hi,
>
> I have an RDD with <S,Tuple2<I,List>> which I want to reduceByKey and get
> I+I and List of List
> (add the integers and build a list of the lists.
>
> BUT reduce by key requires that the return value is of the same type of
> the input
> so I can combine the lists.
>
> JavaPairRDD<String,Tuple2<Integer*,List<*List<String>>>> callCount =
> byCaller.*reduceByKey*(
>  new
> Function2<Tuple2<Integer,List<String>>,Tuple2<Integer,List<String>>,Tuple2<Integer,List<List<String>>>>(){
>  public Tuple2<Integer,List<List<String>>>
> call(Tuple2<Integer,List<String>> i1,Tuple2<Integer,List<String>> i2){
> Integer count = i1._1+i2._1;
>  List<List<String>> combinedList = new ArrayList<List<String>>(2);
> combinedList.add(i1._2);
>  combinedList.add(i2._2);
> return new Tuple2(count,combinedList);
> }
>
>
> any solution for that?
>
> --
> Eran | CTO
>