You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/15 01:22:46 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

HyukjinKwon opened a new pull request #35517:
URL: https://github.com/apache/spark/pull/35517


   ### What changes were proposed in this pull request?
   
   This PR is a followup of https://github.com/apache/spark/pull/35410 to fix a mistake. `HiveContext` should set `spark.sql.catalogImplementation` to `hive` instead of `in-memory`.
   
   This PR also includes several changes:
   - Make `HiveContext.getOrCreate` works identically as `SQLContext.getOrCreate`
   - Match the signature of `HiveContext.__init__` and `SQLContext.__init__` (both are not supported to be directly called by users though).
   
   ### Why are the changes needed?
   
   See https://github.com/apache/spark/pull/35410#discussion_r806358814
   
   ### Does this PR introduce _any_ user-facing change?
   
   No to end users because this change has not been released out yet.
   
   It creates a non-Hive supported `SparkSession` if there isn't an existing SparkSession running. See also https://github.com/apache/spark/pull/35410#discussion_r806358814.
   
   ### How was this patch tested?
   
   Manually tested:
   
   ```python
   spark.stop()
   from pyspark import SparkContext
   from pyspark.sql import HiveContext
   HiveContext.getOrCreate(SparkContext.getOrCreate()).getConf("spark.sql.catalogImplementation")
   ```
   
   **Before:**
   
   ```pyspark
   'in-memory'
   ```
   
   **After:**
   
   ```pyspark
   'hive'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #35517:
URL: https://github.com/apache/spark/pull/35517#issuecomment-1040562928


   Thanks. Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35517:
URL: https://github.com/apache/spark/pull/35517#discussion_r806372548



##########
File path: python/pyspark/sql/context.py
##########
@@ -705,19 +707,33 @@ class HiveContext(SQLContext):
 
     """
 
-    def __init__(self, sparkContext: SparkContext, jhiveContext: Optional[JavaObject] = None):
+    _static_conf = {"spark.sql.catalogImplementation": "hive"}
+
+    def __init__(
+        self,
+        sparkContext: SparkContext,
+        sparkSession: Optional[SparkSession] = None,
+        jhiveContext: Optional[JavaObject] = None,
+    ):
         warnings.warn(
             "HiveContext is deprecated in Spark 2.0.0. Please use "
             + "SparkSession.builder.enableHiveSupport().getOrCreate() instead.",
             FutureWarning,
         )
         static_conf = {}
         if jhiveContext is None:
-            static_conf = {"spark.sql.catalogImplementation": "in-memory"}
+            static_conf = HiveContext._static_conf
         # There can be only one running Spark context. That will automatically
         # be used in the Spark session internally.
-        session = SparkSession._getActiveSessionOrCreate(**static_conf)
-        SQLContext.__init__(self, sparkContext, session, jhiveContext)
+        if sparkSession is not None:
+            sparkSession = SparkSession._getActiveSessionOrCreate(**static_conf)
+        SQLContext.__init__(self, sparkContext, sparkSession, jhiveContext)
+
+    @classmethod
+    def _get_or_create(
+        cls: Type["SQLContext"], sc: SparkContext, **static_conf: Any
+    ) -> "SQLContext":
+        return SQLContext._get_or_create(sc, **HiveContext._static_conf)

Review comment:
       This is to make sure `HiveContext.getOreCreate` sets `spark.sql.catalogImplementation` correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35517:
URL: https://github.com/apache/spark/pull/35517#issuecomment-1040248403


   @viirya would you mind taking a quick look please? I am 99.99% sure on this change...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35517:
URL: https://github.com/apache/spark/pull/35517#issuecomment-1040248403


   @viirya would you mind taking a quick look please? I am 99.99% sure on this change...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #35517:
URL: https://github.com/apache/spark/pull/35517#issuecomment-1040562928


   Thanks. Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya closed pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
viirya closed pull request #35517:
URL: https://github.com/apache/spark/pull/35517


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35517:
URL: https://github.com/apache/spark/pull/35517#discussion_r806372392



##########
File path: python/pyspark/sql/context.py
##########
@@ -705,19 +707,33 @@ class HiveContext(SQLContext):
 
     """
 
-    def __init__(self, sparkContext: SparkContext, jhiveContext: Optional[JavaObject] = None):
+    _static_conf = {"spark.sql.catalogImplementation": "hive"}
+
+    def __init__(

Review comment:
       This matches the signature w/ `SQLContext.__init__`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35517:
URL: https://github.com/apache/spark/pull/35517#discussion_r806372392



##########
File path: python/pyspark/sql/context.py
##########
@@ -705,19 +707,33 @@ class HiveContext(SQLContext):
 
     """
 
-    def __init__(self, sparkContext: SparkContext, jhiveContext: Optional[JavaObject] = None):
+    _static_conf = {"spark.sql.catalogImplementation": "hive"}
+
+    def __init__(

Review comment:
       This matches the signature w/ `SQLContext.__init__`

##########
File path: python/pyspark/sql/context.py
##########
@@ -705,19 +707,33 @@ class HiveContext(SQLContext):
 
     """
 
-    def __init__(self, sparkContext: SparkContext, jhiveContext: Optional[JavaObject] = None):
+    _static_conf = {"spark.sql.catalogImplementation": "hive"}
+
+    def __init__(
+        self,
+        sparkContext: SparkContext,
+        sparkSession: Optional[SparkSession] = None,
+        jhiveContext: Optional[JavaObject] = None,
+    ):
         warnings.warn(
             "HiveContext is deprecated in Spark 2.0.0. Please use "
             + "SparkSession.builder.enableHiveSupport().getOrCreate() instead.",
             FutureWarning,
         )
         static_conf = {}
         if jhiveContext is None:
-            static_conf = {"spark.sql.catalogImplementation": "in-memory"}
+            static_conf = HiveContext._static_conf
         # There can be only one running Spark context. That will automatically
         # be used in the Spark session internally.
-        session = SparkSession._getActiveSessionOrCreate(**static_conf)
-        SQLContext.__init__(self, sparkContext, session, jhiveContext)
+        if sparkSession is not None:
+            sparkSession = SparkSession._getActiveSessionOrCreate(**static_conf)
+        SQLContext.__init__(self, sparkContext, sparkSession, jhiveContext)
+
+    @classmethod
+    def _get_or_create(
+        cls: Type["SQLContext"], sc: SparkContext, **static_conf: Any
+    ) -> "SQLContext":
+        return SQLContext._get_or_create(sc, **HiveContext._static_conf)

Review comment:
       This is to make sure `HiveContext.getOreCreate` sets `spark.sql.catalogImplementation` correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya closed pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
viirya closed pull request #35517:
URL: https://github.com/apache/spark/pull/35517


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35517: [SPARK-38121][PYTHON][SQL][FOLLOW-UP] Set 'spark.sql.catalogImplementation' to 'hive' in HiveContext

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35517:
URL: https://github.com/apache/spark/pull/35517#issuecomment-1040902420


   Thanks!!!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org