You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2018/12/17 22:12:01 UTC
[GitHub] squito commented on a change in pull request #23337: [SPARK-26019][PYSPARK] Allow insecure py4j gateways

squito commented on a change in pull request #23337: [SPARK-26019][PYSPARK] Allow insecure py4j gateways
URL: https://github.com/apache/spark/pull/23337#discussion_r242335813
 
 

 ##########
 File path: python/pyspark/tests.py
 ##########
 @@ -2381,6 +2382,34 @@ def test_startTime(self):
         with SparkContext() as sc:
             self.assertGreater(sc.startTime, 0)
 
+    def test_forbid_insecure_gateway(self):
+        # By default, we fail immediately if you try to create a SparkContext
+        # with an insecure gateway
+        gateway = _launch_gateway(insecure=True)
+        with self.assertRaises(Exception) as context:
+            SparkContext(gateway=gateway)
+        self.assertIn("insecure py4j gateway", context.exception.message)
+        self.assertIn("spark.python.allowInsecurePy4j", context.exception.message)
+        self.assertIn("removed in Spark 3.0", context.exception.message)
+
+    def test_allow_insecure_gateway_with_conf(self):
+        with SparkContext._lock:
+            SparkContext._gateway = None
+            SparkContext._jvm = None
 
 Review comment:
   this part of the test really bothers me, so I'd like to explain to reviewers.  Without this, the test passes -- but it passes even without the changes to the main code!  Or rather, it only passes when its run as part of the entire suite, it would fail when run individually.
   
   What's happening is that `SparkContext._gateway` and `SparkContext._jvm` don't get reset by most tests (eg., they are not reset in `sc.stop()`), so a test running before this one will set those variables, and then this test will end up holding on to a gateway which *does* have the `auth_token` set, and so the accumulator server would still work.
   
   Now that in itself sounds crazy to me, and seems like a problem for things like Zeppelin.  I tried just adding these two lines into `sc.stop()`, but then when I ran all the tests, I got a lot of ` java.io.IOException: error=23, Too many open files in system`.  So maybe something else is not getting properly cleaned up properly in the pyspark tests?
   
   I was hoping somebody else might have some ideas about what is going on or if there is a better way to do this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org