You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alex Sulimanov <as...@Tremorvideo.com> on 2017/03/09 20:24:41 UTC

Distinct for Avro Key/Value PairRDD

Good day everyone!

Have you tried to de-duplicated records based on Avro generated classes? These classes extend SpecificRecord which has equals and hashCode implementation, although when i try to use .distinct on my PairRDD (both key and value are Avro classes), it eliminates records which are NOT duplicates. Any help or suggestion is appreciated!

Using Spark 2.0 with Kafka 2.10-0.8.2.0

Thanks and have a nice day!