You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aureliano Buendia <bu...@gmail.com> on 2014/01/07 01:36:37 UTC

How to access global kryo instance?

Hi,

Is there a way to access the global kryo instance created by spark? I'm
referring to the one which is passed to registerClasses() in a
KryoRegistrator sub class.

I'd like to access this kryo instance inside a map closure, so it should be
accessible from thw workers side too.

Re: How to access global kryo instance?

Posted by Aaron Davidson <il...@gmail.com>.
I see -- the answer is no, we do currently not use an object pool, but
instead just try to create it less frequently (typically one
SerializerInstance per partition). For instance, you could do

rdd.mapPartitions { partitionIterator =>
  val kryo = SparkEnv.get.serializer.newKryo()
  partitionIterator.map(row => doWorkWithKryo(kryo, row))
}

This should amortize the cost greatly. The only requirement of an instance
is that it not be used by multiple threads simultaneously, and this fits
that requirement perfectly.


On Mon, Jan 6, 2014 at 6:59 PM, Aureliano Buendia <bu...@gmail.com>wrote:

>
>
>
> On Tue, Jan 7, 2014 at 2:52 AM, Aaron Davidson <il...@gmail.com> wrote:
>
>> Please take a look at the source code -- it's relatively friendly, and
>> very useful for digging into Spark internals! (KryoSerializer<https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala>
>> )
>>
>> As you can see, a Kryo instance is available via ser.newKryo(). You can
>> also use Spark's SerializerInstance interface (which features serialize()
>> and deserialize() methods) by simply calling ser.newInstance().
>>
>
> Sorry, maybe I wasn't clear. What I meant was, does spark use a singleton
> instance of kryo that can be accessed inside the map closure?
>
> Keep calling ser.newKryo() for every element (inside a map closure) has a
> huge overhead, and it seems newKryo() doesn't use any caching. Twitter
> chill uses an object pool for kryo instances, I'm not sure how spark
> handles this.
>
>
>>
>>
>> On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>
>>> In a map closure, I could use:
>>>
>>> val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]
>>>
>>> But how to get the instance of Kryo that spark uses from ser?
>>>
>>>
>>> On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <il...@gmail.com>wrote:
>>>
>>>> I believe SparkEnv.get.serializer would return the serializer created
>>>> from the "spark.serializer" property.
>>>>
>>>> You can also obtain a Kryo serializer directly via it's no-arg
>>>> constructor (it still invokes your spark.kryo.registrator):
>>>> val serializer = new KryoSerializer()
>>>> but this could have some overhead, and so should probably not be done
>>>> for every element you process.
>>>>
>>>>
>>>> On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <buendia360@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a way to access the global kryo instance created by spark?
>>>>> I'm referring to the one which is passed to registerClasses() in a
>>>>> KryoRegistrator sub class.
>>>>>
>>>>> I'd like to access this kryo instance inside a map closure, so it
>>>>> should be accessible from thw workers side too.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to access global kryo instance?

Posted by Aureliano Buendia <bu...@gmail.com>.
On Tue, Jan 7, 2014 at 2:52 AM, Aaron Davidson <il...@gmail.com> wrote:

> Please take a look at the source code -- it's relatively friendly, and
> very useful for digging into Spark internals! (KryoSerializer<https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala>
> )
>
> As you can see, a Kryo instance is available via ser.newKryo(). You can
> also use Spark's SerializerInstance interface (which features serialize()
> and deserialize() methods) by simply calling ser.newInstance().
>

Sorry, maybe I wasn't clear. What I meant was, does spark use a singleton
instance of kryo that can be accessed inside the map closure?

Keep calling ser.newKryo() for every element (inside a map closure) has a
huge overhead, and it seems newKryo() doesn't use any caching. Twitter
chill uses an object pool for kryo instances, I'm not sure how spark
handles this.


>
>
> On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> In a map closure, I could use:
>>
>> val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]
>>
>> But how to get the instance of Kryo that spark uses from ser?
>>
>>
>> On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <il...@gmail.com>wrote:
>>
>>> I believe SparkEnv.get.serializer would return the serializer created
>>> from the "spark.serializer" property.
>>>
>>> You can also obtain a Kryo serializer directly via it's no-arg
>>> constructor (it still invokes your spark.kryo.registrator):
>>> val serializer = new KryoSerializer()
>>> but this could have some overhead, and so should probably not be done
>>> for every element you process.
>>>
>>>
>>> On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there a way to access the global kryo instance created by spark? I'm
>>>> referring to the one which is passed to registerClasses() in a
>>>> KryoRegistrator sub class.
>>>>
>>>> I'd like to access this kryo instance inside a map closure, so it
>>>> should be accessible from thw workers side too.
>>>>
>>>
>>>
>>
>

Re: How to access global kryo instance?

Posted by Aaron Davidson <il...@gmail.com>.
Please take a look at the source code -- it's relatively friendly, and very
useful for digging into Spark internals!
(KryoSerializer<https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala>
)

As you can see, a Kryo instance is available via ser.newKryo(). You can
also use Spark's SerializerInstance interface (which features serialize()
and deserialize() methods) by simply calling ser.newInstance().


On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <bu...@gmail.com>wrote:

> In a map closure, I could use:
>
> val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]
>
> But how to get the instance of Kryo that spark uses from ser?
>
>
> On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <il...@gmail.com> wrote:
>
>> I believe SparkEnv.get.serializer would return the serializer created
>> from the "spark.serializer" property.
>>
>> You can also obtain a Kryo serializer directly via it's no-arg
>> constructor (it still invokes your spark.kryo.registrator):
>> val serializer = new KryoSerializer()
>> but this could have some overhead, and so should probably not be done for
>> every element you process.
>>
>>
>> On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> Is there a way to access the global kryo instance created by spark? I'm
>>> referring to the one which is passed to registerClasses() in a
>>> KryoRegistrator sub class.
>>>
>>> I'd like to access this kryo instance inside a map closure, so it should
>>> be accessible from thw workers side too.
>>>
>>
>>
>

Re: How to access global kryo instance?

Posted by Aureliano Buendia <bu...@gmail.com>.
In a map closure, I could use:

val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]

But how to get the instance of Kryo that spark uses from ser?


On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <il...@gmail.com> wrote:

> I believe SparkEnv.get.serializer would return the serializer created from
> the "spark.serializer" property.
>
> You can also obtain a Kryo serializer directly via it's no-arg constructor
> (it still invokes your spark.kryo.registrator):
> val serializer = new KryoSerializer()
> but this could have some overhead, and so should probably not be done for
> every element you process.
>
>
> On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> Hi,
>>
>> Is there a way to access the global kryo instance created by spark? I'm
>> referring to the one which is passed to registerClasses() in a
>> KryoRegistrator sub class.
>>
>> I'd like to access this kryo instance inside a map closure, so it should
>> be accessible from thw workers side too.
>>
>
>

Re: How to access global kryo instance?

Posted by Aaron Davidson <il...@gmail.com>.
I believe SparkEnv.get.serializer would return the serializer created from
the "spark.serializer" property.

You can also obtain a Kryo serializer directly via it's no-arg constructor
(it still invokes your spark.kryo.registrator):
val serializer = new KryoSerializer()
but this could have some overhead, and so should probably not be done for
every element you process.


On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <bu...@gmail.com>wrote:

> Hi,
>
> Is there a way to access the global kryo instance created by spark? I'm
> referring to the one which is passed to registerClasses() in a
> KryoRegistrator sub class.
>
> I'd like to access this kryo instance inside a map closure, so it should
> be accessible from thw workers side too.
>