You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nkechi Achara <nk...@googlemail.com> on 2016/07/13 08:30:13 UTC

Spark, Kryo Serialization Issue with ProtoBuf field

Hi,

I am seeing an error when running my spark job relating to Serialization of
a protobuf field when transforming an RDD.

com.esotericsoftware.kryo.KryoException:
java.lang.UnsupportedOperationException Serialization trace: otherAuthors_
(com.thomsonreuters.kraken.medusa.dbor.proto.Book$DBBooks)

The error seems to be created at this point:

val booksPerTier: Iterable[(TimeTier, RDD[DBBooks])] = allTiers.map {

          tier => (tier, books.filter(b => isInTier(endOfInterval, tier, b)
&&     !isBookPublished(o)).mapPartitions( it =>

          it.map{ord =>

            (ord.getAuthor, ord.getPublisherName,
getGenre(ord.getSourceCountry))}))

    }


    val averagesPerAuthor = booksPerTier.flatMap { case (tier, opt) =>

      opt.map(o => (tier, o._1, PublisherCompanyComparison,
o._3)).countByValue()

    }


    val averagesPerPublisher = booksPerTier.flatMap { case (tier, opt) =>

      opt.map(o => (tier, o._1, PublisherComparison(o._2),
o._3)).countByValue()

    }

The field is a list specified in the protobuf as the below:

otherAuthors_ = java.util.Collections.emptyList()

As you can see the code is not actually utilising that field from the Book
Protobuf, although it still is being transmitted over the network.

Has anyone got any advice on this?

Thanks,

K