You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/11 15:32:58 UTC

[GitHub] [beam] mosche commented on a diff in pull request #24862: [Spark Dataset runner] Fix SparkSessionFactory to better support running on a cluster.

mosche commented on code in PR #24862:
URL: https://github.com/apache/beam/pull/24862#discussion_r1067138298


##########
runners/spark/3/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkSessionFactory.java:
##########
@@ -87,6 +96,24 @@ public class SparkSessionFactory {
 
   private static final Logger LOG = LoggerFactory.getLogger(SparkSessionFactory.class);
 
+  // Patterns to exclude local JRE and certain artifact (groups) in Maven and Gradle cache.
+  private static final Collection<String> SPARK_JAR_EXCLUDES =

Review Comment:
   This isn't currently reflected by anything in the gradle build config in the Beam repo. Of course you could move that there and generate a resource from it which is then read in SparkSessionFactory. But honestly, I think that's a lot more harmful than valuable as is obfuscates it from the user what get's excluded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org