You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/11 05:58:23 UTC

[GitHub] [spark] itholic opened a new pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

itholic opened a new pull request #35488:
URL: https://github.com/apache/spark/pull/35488


   ### What changes were proposed in this pull request?
   
   This PR proposes to show warning message when creating pandas-on-Spark session under ANSI mode.
   
   The message will be shown looks like the below:
   ```python
   >>> ps.Series(['a', 'b', 'c'])
   .../spark/python/pyspark/pandas/utils.py:969: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause the unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   ```
   
   ### Why are the changes needed?
   
   Since pandas API on Spark follows the behavior of pandas, not SQL.
   
   So the unexpected behavior can be occurred when the ANSI mode is on (when "spark.sql.ansi.enabled" is True).
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   This will show warning message as mentioned above, when the session for pandas API on Spark is initialized if "spark.sql.ansi.enabled" is set as True.
   
   
   ### How was this patch tested?
   
   The existing tests should be paseed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic edited a comment on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic edited a comment on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1039647930


   Thanks, @bjornjorgensen 
   
   JIRA updated, and sure let me consider adding this note to somewhere in the pandas-on-Spark document.
   
   I think maybe we can document this under the [Best Practice](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/best_practices.html) or somewhere.
   
   Also, "the" is removed :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r821035033



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):

Review comment:
       Currently we always see the warning for the first running this.
   This should be `spark.conf.get("spark.sql.ansi.enabled", "false") == "true"`.
   cc @itholic, @HyukjinKwon 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1039823604


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r806335285



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True."

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r806335285



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True."

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic edited a comment on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic edited a comment on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1039647930


   Thanks, @bjornjorgensen 
   
   JIRA updated, and sure let me consider adding this note to somewhere in the pandas-on-Spark document.
   
   I think maybe we can document this under the [Best Practice](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/best_practices.html) or somewhere.
   
   Also, "the" is removed :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r821035033



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):

Review comment:
       Currently we always see the warning for the first running this.
   This should be `spark.conf.get("spark.sql.ansi.enabled", "false") == "true"`, or `spark._jconf.ansiEnabled`.
   cc @itholic, @HyukjinKwon 

##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):

Review comment:
       Currently we always see the warning for the first running this.
   This should be `spark.conf.get("spark.sql.ansi.enabled", "false") == "true"`, or `spark._jconf.ansiEnabled()`.
   cc @itholic, @HyukjinKwon 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r806328063



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True."

Review comment:
       nit: I guess we need an space at the end of this string.
   
   ```
   "... set to True. "
   ```
   
   Otherwise, the two sentences are connected without a space like ` ... set to True.This ...`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1039823604


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1038500178


   Updated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1036040353


   The configuration is off by default, so should probably fine .. but would be great to list up few examples of the unexpected behaviour in the PR description as @bjornjorgensen mentioned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r806328063



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True."

Review comment:
       nit: I guess we need an space at the end of this string.
   
   ```
   "... set to True. "
   ```
   
   Otherwise, the two sentences are connected without a space like " ... set to True.This ..."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r821228724



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):

Review comment:
       Let's make sure testing this especially when there's no test next time ...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r821140585



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):

Review comment:
       Thanks!! Let me open the follow-up to fix this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1039647930


   Thanks, @bjornjorgensen 
   
   JIRA updated, and sure let me consider adding this note to somewhere in the pandas-on-Spark document.
   
   I think we can document this under the [Best Practice](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/best_practices.html) or somewhere.
   
   Also, "the" is removed :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bjornjorgensen commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
bjornjorgensen commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1036034626


   "This can cause the unexpected behavior"
   But what is the unexpected behavior?
   
   If this is something big, then we need some documentation for this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
ueshin commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r806328063



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True."

Review comment:
       nit: I guess we need an space at the end of this string.
   
   ```
   "... set to True. "
   ```
   
   Otherwise, the two sentences are connected without a space like " ... set to True.This ..."

##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,18 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True."

Review comment:
       nit: I guess we need an space at the end of this string.
   
   ```
   "... set to True. "
   ```
   
   Otherwise, the two sentences are connected without a space like ` ... set to True.This ...`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #35488:
URL: https://github.com/apache/spark/pull/35488


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #35488:
URL: https://github.com/apache/spark/pull/35488


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1039647930


   Thanks, @bjornjorgensen 
   
   JIRA updated, and sure let me consider adding this note to somewhere in the pandas-on-Spark document.
   
   I think we can document this under the [Best Practice](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/best_practices.html) or somewhere.
   
   Also, "the" is removed :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1036683889


   Thanks for the review!
   
   Let me update the PR description after some tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35488:
URL: https://github.com/apache/spark/pull/35488#discussion_r804497038



##########
File path: python/pyspark/pandas/utils.py
##########
@@ -467,11 +467,16 @@ def is_testing() -> bool:
 
 def default_session() -> SparkSession:
     spark = SparkSession.getActiveSession()
-    if spark is not None:
-        return spark
+    if spark is None:
+        spark = SparkSession.builder.appName("pandas-on-Spark").getOrCreate()
+
+    if spark.conf.get("spark.sql.ansi.enabled"):
+        log_advice(
+            "The config 'spark.sql.ansi.enabled' is set to True. This can cause the unexpected behavior "
+            "from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL."

Review comment:
       @itholic can you fix the linter failure?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1035910152


   cc @ueshin FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bjornjorgensen commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
bjornjorgensen commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1036045494


   SQL ANSI mode 'spark.sql.ansi.enabled' is set to True. This is an experimental config. For more information spark.apache.org/docs/latest/sql-ref-ansi-compliance.html 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
itholic commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1038500178


   Updated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bjornjorgensen commented on pull request #35488: [SPARK-38183][PYTHON] Show warning when creating pandas-on-Spark session under ANSI mode.

Posted by GitBox <gi...@apache.org>.
bjornjorgensen commented on pull request #35488:
URL: https://github.com/apache/spark/pull/35488#issuecomment-1038757866


   @itholic very good information you share with us. 
   I hope that you will update the jira post with this information, and if we have a web page with pandas on spark API info we can consider adding it there too.
   
   Remove "the" in the message. So it will look like this: 
   
   "The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org