You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aureliano Buendia <bu...@gmail.com> on 2014/01/05 20:52:30 UTC
Why does saveAfObjectFile() serialize Array[T] instead of T?
Hi,
Given an RDD[T] instance, saveAfObjectFile() passes an instance of Array[T]
to serialize(), and not and instance of T:
def saveAsObjectFile(path: String) {
this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
.map(*x* => (NullWritable.get(), new BytesWritable(
*Utils.serialize(x)*)))
.saveAsSequenceFile(path)
}
Is this array mapping for efficiency reasons, or are there other reasons
for this?
I'm trying to use saveAfObjectFile() to serialize protobuf messages.
Protobuf messages already come with a method that turns them into
Array[Byte] (see
here<https://github.com/osmandapp/Osmand/blob/master/OsmAnd-java/src/com/google/protobuf/AbstractMessageLite.java#L60>),
that is, toByteArray() can be clled on an instance of T. Considering this,
how can a protobuf message instance be serialized in a custom version of
saveAsObjectFile()?
Is it a good idea to drop array mapping?:
def saveAsObjectFile(path: String) {
this.map(x => (NullWritable.get(), new BytesWritable(*x.toByteArray()*
)))
.saveAsSequenceFile(path)
}