You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/04/21 06:36:24 UTC

[GitHub] [beam] mosche commented on pull request #17406: [BEAM-14334] Fix leakage of SparkContext in Spark runner tests to remove forkEvery 1

mosche commented on PR #17406:
URL: https://github.com/apache/beam/pull/17406#issuecomment-1104770253

   Hmm, I noticed a related issue here. `SparkContextOptions` doesn't work with `TestPipeline` because `providedSparkContext` is ignored during the serde roundtrip to test that everything can be serialized before actually running the pipeline :/
   
   IMHO `providedSparkContext` really doesn't belong into PipelineOptions, it can't be serialized and the resulting behavior is very inconsistent... tough that would be a breaking change. I suggest adding methods to set the provided Spark context to `SparkContextFactory`.  If a context is provided using `SparkContextOptions`, it will be stored in the factory using `setProvidedSparkContext` as well.
   
   This also allows to clear the provided Spark context as well, allowing for much cleaner code.
   
   ```java
     /**
      * Set an externally managed {@link JavaSparkContext} that will be used if {@link
      * SparkContextOptions#getUsesProvidedSparkContext()} is set to {@code true}.
      *
      * <p>A Spark context can also be provided using {@link
      * SparkContextOptions#setProvidedSparkContext(JavaSparkContext)}. However, it will be dropped
      * during serialization potentially leading to confusing behavior. This is particularly the case
      * when used in tests with {@link org.apache.beam.sdk.testing.TestPipeline}.
      */
     public static synchronized void setProvidedSparkContext(JavaSparkContext providedSparkContext) 
   
     public static synchronized void clearProvidedSparkContext()
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org