You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ping Tang <pt...@aerohive.com> on 2014/11/05 23:30:39 UTC

Question regarding sorting and grouping

Hi,

I’m working on an use case using Spark streaming. I need to process a RDD of strings so that they will be grouped by IP and sorted by time. Could somebody tell me the right transformation?

Input:

2014-10-23 08:18:38,904 [192.168.10.1] bbbb
2014-10-23 08:18:38,907 [192.168.10.1] ccc
2014-10-23 08:18:39,910 [192.168.102.1] hhhh
2014-10-23 08:18:38,934 [192.168.10.1] eeee
2014-10-23 08:18:39,032 [192.168.102.1] ffff
2014-10-23 08:18:38,149 [192.168.10.1] aaaa
2014-10-23 08:18:39,582 [192.168.102.1] gggg
2014-10-23 08:18:38,691 [192.168.10.1] dddd

Expected result:

Array(192.168.10.1, ArrayBuffer(
2014-10-23 08:18:38,149 [192.168.10.1] aaaa,
2014-10-23 08:18:38,904 [192.168.10.1] bbbb,
2014-10-23 08:18:38,907 [192.168.10.1] ccc,
2014-10-23 08:18:38,691 [192.168.10.1] dddd,
2014-10-23 08:18:38,934 [192.168.10.1] eeee))
(192.168.102.1, ArrayBuffer(
2014-10-23 08:18:39,032 [192.168.102.1] ffff,
2014-10-23 08:18:39,582 [192.168.102.1] gggg,
2014-10-23 08:18:39,910 [192.168.102.1] hhhh))

Thanks

Ping