You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/09 11:43:40 UTC

[GitHub] [spark] zero323 commented on a change in pull request #35410: [WIP][SPARK-38121][PYTHON][SQL] Use SparkSession instead of SQLContext inside PySpark

zero323 commented on a change in pull request #35410:
URL: https://github.com/apache/spark/pull/35410#discussion_r802573722



##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -104,10 +105,25 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
     .. versionadded:: 1.3.0
     """
 
-    def __init__(self, jdf: JavaObject, sql_ctx: "SQLContext"):
-        self._jdf = jdf
-        self.sql_ctx = sql_ctx
-        self._sc: SparkContext = cast(SparkContext, sql_ctx and sql_ctx._sc)
+    def __init__(
+        self,
+        jdf: JavaObject,
+        sql_ctx: Optional["SQLContext"] = None,
+        session: Optional["SparkSession"] = None,
+    ):
+        assert sql_ctx is not None or session is not None
+        self._session = session
+        self._sql_ctx = sql_ctx

Review comment:
       Just thinking out loud. Why not stick to a single argument (`sql_ctx` or `session` ‒ it might require some research, but I fairly sure it is usually passed as positional) and  
   
   ```python
   self.session = session if isinstance(sql_ctx, SparkSession) else sql_ctx.sparkSession
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org