You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (Jira)" <ji...@apache.org> on 2022/07/03 20:51:00 UTC

[jira] [Resolved] (SPARK-39489) Improve EventLoggingListener and ReplayListener performance by replacing Json4S ASTs with Jackson trees

     [ https://issues.apache.org/jira/browse/SPARK-39489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen resolved SPARK-39489.
--------------------------------
    Fix Version/s: 3.4.0
       Resolution: Fixed

Issue resolved by pull request 36885
[https://github.com/apache/spark/pull/36885]

> Improve EventLoggingListener and ReplayListener performance by replacing Json4S ASTs with Jackson trees
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39489
>                 URL: https://issues.apache.org/jira/browse/SPARK-39489
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Spark's event log JsonProtocol currently uses Json4s ASTs to generate and parse JSON. Performance overheads from Json4s account for a significant proportion of all time spent in JsonProtocol. If we replace Json4s usage with direct usage of Jackson APIs then we can significantly improve performance (~2x improvement for writing and reading in my own local microbenchmarks).
> This performance improvement translates to faster history server load times and reduced load on the Spark driver (and reduced likelihood of dropping events because the listener cannot keep up, therefore reducing the likelihood of inconsistent Spark UIs).
> Reducing our usage of Json4s is also a step towards being able to eventually remove our dependency on Json4s: Spark's current use of Json4s creates library conflicts for end users who want to adopt Json4s 4 (see discussion on PRs for SPARK-36408). If Spark can eventually remove its Json4s dependency then we will completely eliminate such conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org