You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matt Cheah (JIRA)" <ji...@apache.org> on 2015/07/17 07:13:04 UTC
[jira] [Commented] (SPARK-5269) BlockManager.dataDeserialize always
creates a new serializer instance
[ https://issues.apache.org/jira/browse/SPARK-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630783#comment-14630783 ]
Matt Cheah commented on SPARK-5269:
-----------------------------------
I'd be interested in working on this with [~vladio]. Can this be assigned to either of us?
> BlockManager.dataDeserialize always creates a new serializer instance
> ---------------------------------------------------------------------
>
> Key: SPARK-5269
> URL: https://issues.apache.org/jira/browse/SPARK-5269
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Ivan Vergiliev
> Labels: performance, serializers
>
> BlockManager.dataDeserialize always creates a new instance of the serializer, which is pretty slow in some cases. I'm using Kryo serialization and have a custom registrator, and its register method is showing up as taking about 15% of the execution time in my profiles. This started happening after I increased the number of keys in a job with a shuffle phase by a factor of 40.
> One solution I can think of is to create a ThreadLocal SerializerInstance for the defaultSerializer, and only create a new one if a custom serializer is passed in. AFAICT a custom serializer is passed only from DiskStore.getValues, and that, on the other hand, depends on the serializer passed to ExternalSorter. I don't know how often this is used, but I think this can still be a good solution for the standard use case.
> Oh, and also - ExternalSorter already has a SerializerInstance, so if the getValues method is called from a single thread, maybe we can pass that directly?
> I'd be happy to try a patch but would probably need a confirmation from someone that this approach would indeed work (or an idea for another).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org