You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mihran Shahinian <sl...@gmail.com> on 2015/03/26 21:47:08 UTC
Fuzzy GroupBy
I would like to group records, but instead of grouping on exact key I want
to be able to compute the similarity of keys on my own. Is there a
recommended way of doing this?
here is my starting point
final JavaRDD< pojo > records = spark.parallelize(getListofPojos()).cache();
class pojo {
String prop1
String prop2
}
during groupBy I would like to compute similarity between prop1 for each
pojo.
Much appreciated,
Mihran
Re: Fuzzy GroupBy
Posted by Sean Owen <so...@cloudera.com>.
The grouping is determined by the POJO's equals() method. You can also
call groupBy() to group by some function of the POJOs. For example if
you're grouping Doubles into nearly-equal bunches, you could group by
their .intValue()
On Thu, Mar 26, 2015 at 8:47 PM, Mihran Shahinian <sl...@gmail.com> wrote:
> I would like to group records, but instead of grouping on exact key I want
> to be able to compute the similarity of keys on my own. Is there a
> recommended way of doing this?
>
> here is my starting point
>
> final JavaRDD< pojo > records = spark.parallelize(getListofPojos()).cache();
>
> class pojo {
> String prop1
> String prop2
> }
>
> during groupBy I would like to compute similarity between prop1 for each
> pojo.
>
> Much appreciated,
> Mihran
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org