You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2019/06/20 02:07:00 UTC

[jira] [Resolved] (SPARK-28112) Fix Kryo exception perf. bottleneck in tests due to absence of ML/MLlib classes

     [ https://issues.apache.org/jira/browse/SPARK-28112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen resolved SPARK-28112.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 24916
[https://github.com/apache/spark/pull/24916]

> Fix Kryo exception perf. bottleneck in tests due to absence of ML/MLlib classes
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-28112
>                 URL: https://issues.apache.org/jira/browse/SPARK-28112
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, Tests
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Assignee: Josh Rosen
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In a nutshell, it looks like the absence of ML / MLlib classes on the classpath causes code in KryoSerializer to throw and catch ClassNotFoundExceptions whenever instantiating a new serializer in {{newInstance()}}. This isn't a performance problem in production (since MLlib is on the classpath there) but it's a huge issue in tests and appears to account for an enormous amount of test time
> We can address this problem by reducing the total number of ClassNotFoundExceptions by performing the class existence checks once and storing the results in KryoSerializer instances rather than repeating the checks on each {{newInstance()}} call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org