You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2016/04/06 22:02:53 UTC

[GitHub] spark pull request: [SPARK-14222] Remove jackson-module-scala depe...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/12213

    [SPARK-14222] Remove jackson-module-scala dependency

    This patch removes our `jackson-module-scala` dependency in order to reduce the number of dependencies that we'll have to upgrade when adding experimental Scala 2.12 support. I think that our current use of this library is fairly minimal and removing the dependency seems like less work than having to help test and publish it for every Scala 2.12 milestone plus the final 2.12 release (see https://github.com/FasterXML/jackson-module-scala/issues/245).
    
    /cc @vanzin @andrewor14

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark remove-jackson-module-scala

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12213
    
----
commit 4104b1a3e9009d58388483f5d0c582615ddbc95f
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-04-06T19:58:34Z

    Remove jackson-module-scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12213#discussion_r58791357
  
    --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/JacksonMessageWriter.scala ---
    @@ -47,7 +47,6 @@ private[v1] class JacksonMessageWriter extends MessageBodyWriter[Object]{
           super.writeValueAsString(t)
         }
       }
    -  mapper.registerModule(com.fasterxml.jackson.module.scala.DefaultScalaModule)
    --- End diff --
    
    This was added by @squito in #5940 in order to allow SparkStatusAPI POJOs to be serialized using Jackson. These POJOs are defined in https://github.com/apache/spark/blob/a4ead6d3881f071a2ae53ff1c961c6ac388cac1d/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
    
    I wonder if the default values in the `ApplicationAttemptInfo` are going to be handled differently without the `jackson-module-scala` stuff. We might have to explicitly add annotations in order to pin those defaults.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206536724
  
    **[Test build #55145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55145/consoleFull)** for PR 12213 at commit [`4104b1a`](https://github.com/apache/spark/commit/4104b1a3e9009d58388483f5d0c582615ddbc95f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206631421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206535731
  
    By the way, I'm almost certain that this patch is going to fail its first round of tests provided that our coverage is good enough; just wanted to get an early draft out for feedback to find out if there are any catastrophic blockers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206568595
  
    I think one place that would break without the scala module, and where we don't have unit tests right now, is rebuilding the SQL UI in the history server. The events are read from the log file and processed with jackson, and since they're case classes, that probably won't work without the scala module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206630965
  
    **[Test build #55145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55145/consoleFull)** for PR 12213 at commit [`4104b1a`](https://github.com/apache/spark/commit/4104b1a3e9009d58388483f5d0c582615ddbc95f).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12213#discussion_r58790912
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -56,7 +55,7 @@ private[spark] object JsonProtocol {
     
       private implicit val format = DefaultFormats
     
    -  private val mapper = new ObjectMapper().registerModule(DefaultScalaModule)
    --- End diff --
    
    /cc @carsonwang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen closed the pull request at:

    https://github.com/apache/spark/pull/12213


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by cowtowncoder <gi...@git.apache.org>.
Github user cowtowncoder commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206555763
  
    I don't know enough about Spark internals to know if this makes sense, but there is a smaller alternative databinding, Jackson jr:
    
    https://github.com/FasterXML/jackson-jr
    
    which does use Jackson's streaming parser+generator (`jackson-core`), but not databinding. This makes it about 80% smaller to use. There is also uber-jar variant with shaded `jackson-core`, which would remove any conflicts with various jackson versions, with size of 350kB (jackson-jr-objects which does not bundle jackson-core is about 75kB).
    Jackson-jr does support both "untyped" access (Maps, Lists, wrappers) and very basic Bean-binding, and there is even simple optional tree model (jackson-jr-stree, 24kB); and performance-wise is comparable to full Jackson databinding for most use cases.
    
    So using this might allow removal of `jackson-databind`, and this in turn could help with some of version compatibility problems as many spark jobs use Jackson for their own needs.
    But as I said, I don't know how heavy usage is and whether this is a practical possibility.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12213#discussion_r58792709
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala ---
    @@ -79,7 +78,7 @@ private[spark] class RDDOperationScope(
      * An RDD scope tracks the series of operations that created a given RDD.
      */
     private[spark] object RDDOperationScope extends Logging {
    -  private val jsonMapper = new ObjectMapper().registerModule(DefaultScalaModule)
    --- End diff --
    
    This was added by @andrewor14 in #5729 as part of the initial DAG visualization patch. Here, the only class that we serialize happens to be `RDDOperationScope`, so I think this removal is addressed by the explicit `@JsonProperty` annotations that I added above.
    
    Let me quickly double-check that the handling of `Option` is correct, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12213#discussion_r58790852
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -56,7 +55,7 @@ private[spark] object JsonProtocol {
     
       private implicit val format = DefaultFormats
     
    -  private val mapper = new ObjectMapper().registerModule(DefaultScalaModule)
    --- End diff --
    
    This was added in #10061 in order to allow the new `SparkListenerSQLExecutionStart` and `SparkListenerSQLExecutionEnd` events to be written to the event log using Jackson. I'll see if there's an existing unit test for roundtrip serialization of these events.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206632109
  
    Okay, so it turns out that removing this is going to be kind of tricky right now because we'd take on the burden of ensuring proper serialization of Scala `Option` types and, from experience, that's going to become a bit messy.
    
    Therefore, I'm going to fall back on an alternative proposal: I'll take on the burden of getting jackson-module-scala to run on 2.12 and then will work to move from a lot of imperative json4s code to Jackson databind (https://issues.apache.org/jira/browse/SPARK-12141) so that we can eventually remove json4s and consolidate on Jackson (https://issues.apache.org/jira/browse/SPARK-14439). Once we've done that, if it later turns out that we need to remove jackson-module-scala then we can just absorb / inline whatever portion of it we actually use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by cowtowncoder <gi...@git.apache.org>.
Github user cowtowncoder commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206566391
  
    @JoshRosen ah. Yes, good point on usage via json4s.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206564355
  
    Besides a few direct uses of Jackson, I think that Spark also uses it indirectly through `json4s-jackson` so switching to `jackson-jr` might be a little tricky right now. Thanks for pointing out that library, though: it's going to be a huge help if we do wind up having to shade / hide the full version of Jackson from our classpath.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206631428
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55145/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222] Remove jackson-module-scala depe...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206534871
  
    /cc @aarondav and @ahirreddy, do you know of any pitfalls that I watch out for in removing this dependency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14222][BUILD] Remove jackson-module-sca...

Posted by cowtowncoder <gi...@git.apache.org>.
Github user cowtowncoder commented on the pull request:

    https://github.com/apache/spark/pull/12213#issuecomment-206672520
  
    @JoshRosen probably makes sense, but just in case it needs to be revisited Jackson's core databind has much more support for "referential types" (which includes `Option`/`Optional`) as of 2.7, and supporting it should be much simpler with basic (de)serializer like one used for `AtomicReference`. I can help more if/when this is needed; for now I just mention it as forward-reference. I will actually need to work with @nbauernfeind (maintainer of Jackson Scala module) to ensure Scala module 2.7 is fully up-to-date with this part of type handling; earlier it required much more explicit work by module.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org