You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SK <sk...@gmail.com> on 2014/06/11 03:10:31 UTC

groupBy question

After doing a groupBy operation, I have the following result:

 val res = 
("ID1",ArrayBuffer((1458046000001,"ID1","japan")))
("ID3",ArrayBuffer((1458650800000,"ID3","canada"),
(1458996400000,"ID3","china")))
("ID2",ArrayBuffer((1457527600000,"ID2","usa"),
(1459342000000,"ID2","usa")))

Now I need to output for each group, the size of each group and the max of
the first field, which is a timestamp.
So, I tried the following:

1) res.map(group => (group._2.size, group._2._1.max))
But I got an error : value _1 is not a member of Iterable[(Long, String,
String)]

2) I also tried: res.map(group => (group._2.size, group._2[1].max)), but got
an error for that as well.

What is the right way to get the max of the timestamp field (the first field
in the ArrayBuffer) for each group?


thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-question-tp7357.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: groupBy question

Posted by SK <sk...@gmail.com>.
Great, thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-question-tp7357p7360.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: groupBy question

Posted by Shuo Xiang <sh...@gmail.com>.
res.map(group => (group._2.size, group._2.map(_._1).max))


On Tue, Jun 10, 2014 at 6:10 PM, SK <sk...@gmail.com> wrote:

> After doing a groupBy operation, I have the following result:
>
>  val res =
> ("ID1",ArrayBuffer((1458046000001,"ID1","japan")))
> ("ID3",ArrayBuffer((1458650800000,"ID3","canada"),
> (1458996400000,"ID3","china")))
> ("ID2",ArrayBuffer((1457527600000,"ID2","usa"),
> (1459342000000,"ID2","usa")))
>
> Now I need to output for each group, the size of each group and the max of
> the first field, which is a timestamp.
> So, I tried the following:
>
> 1) res.map(group => (group._2.size, group._2._1.max))
> But I got an error : value _1 is not a member of Iterable[(Long, String,
> String)]
>
> 2) I also tried: res.map(group => (group._2.size, group._2[1].max)), but
> got
> an error for that as well.
>
> What is the right way to get the max of the timestamp field (the first
> field
> in the ArrayBuffer) for each group?
>
>
> thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-question-tp7357.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>