You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Megha Sridhar- Cynepia <me...@cynepia.com> on 2015/07/03 09:59:08 UTC

Filter on Grouped Data

Hi,


I have a Spark DataFrame object, which when trimmed, looks like,



>From            To                  Subject        Message-ID
karen.den@xyz.com    ['vance.meyer@enron.com',         SEC Inquiry     
  <19952575.1075858>
              'jeannie.mandelker@enron.com',
              'mary.clark@enron.com',
              'sarah.palmer@enron.com']



elyn.hughes@xyz.com    ['dennis.vegas@enron.com',        Revised 
documents    <33499184.1075858>
              'gina.taylor@enron.com',
              'kelly.kimberly@enron.com']
.
.
.


I have run a groupBy("From") on the above dataFrame and obtained a 
GroupedData object as a result. I need to apply a filter on the grouped 
data (for instance, getting the sender who sent maximum number of the 
mails that were addressed to a particular receiver in the "To" list).
Is there a way to accomplish this by applying filter on grouped data?


Thanks,
Megha


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Filter on Grouped Data

Posted by Raghavendra Pandey <ra...@gmail.com>.
Why dont you apply filter first and then Group the data and run
aggregations..
On Jul 3, 2015 1:29 PM, "Megha Sridhar- Cynepia" <me...@cynepia.com>
wrote:

> Hi,
>
>
> I have a Spark DataFrame object, which when trimmed, looks like,
>
>
>
> From            To                  Subject        Message-ID
> karen.den@xyz.com    ['vance.meyer@enron.com',         SEC Inquiry
> <19952575.1075858>
>              'jeannie.mandelker@enron.com',
>              'mary.clark@enron.com',
>              'sarah.palmer@enron.com']
>
>
>
> elyn.hughes@xyz.com    ['dennis.vegas@enron.com',        Revised
> documents    <33499184.1075858>
>              'gina.taylor@enron.com',
>              'kelly.kimberly@enron.com']
> .
> .
> .
>
>
> I have run a groupBy("From") on the above dataFrame and obtained a
> GroupedData object as a result. I need to apply a filter on the grouped
> data (for instance, getting the sender who sent maximum number of the mails
> that were addressed to a particular receiver in the "To" list).
> Is there a way to accomplish this by applying filter on grouped data?
>
>
> Thanks,
> Megha
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>