You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/08 15:42:55 UTC

[GitHub] [spark] nchammas commented on a change in pull request #31768: [SPARK-33436][PYSPARK] PySpark equivalent of SparkContext.hadoopConfiguration

nchammas commented on a change in pull request #31768:
URL: https://github.com/apache/spark/pull/31768#discussion_r589521677



##########
File path: python/pyspark/context.py
##########
@@ -1255,6 +1255,16 @@ def getConf(self):
         conf.setAll(self._conf.getAll())
         return conf
 
+    def hadoopConfiguration(self):
+        """
+        Returns the Hadoop configuration used for the Hadoop code (e.g. file systems) we reuse.
+
+        As it will be reused in all Hadoop RDDs, it's better not to modify it unless you
+        plan to set some global configurations for all Hadoop RDDs.
+        Return :class:`Configuration` object
+        """
+        return self._jsc.hadoopConfiguration()

Review comment:
       The purpose of the ticket is as [you described](https://github.com/apache/spark/pull/31768#discussion_r589082773). 
   
   The story I relate in the ticket description is basically the same as yours: I wanted to set some Hadoop configs related to S3A; I couldn't find a direct way to do that; the best approach I could find went through `_.jsc`, so I thought that merited an improvement to PySpark so it had a direct means of doing the same thing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org