You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/05 19:59:32 UTC

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38070: [SPARK-38004][PYTHON] Mangle dupe cols documentation

HyukjinKwon commented on code in PR #38070:
URL: https://github.com/apache/spark/pull/38070#discussion_r985387151


##########
python/pyspark/pandas/namespace.py:
##########
@@ -1049,6 +1049,10 @@ def read_excel(
         Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
         'X'...'X'. Passing in False will cause data to be overwritten if there
         are duplicate names in the columns.
+        .. note:: This process is not case sensitive. If two columns are spelled the
+        same with different casing then an ambiguity error will arise. Specifying
+        `spark.conf.set("spark.sql.caseSensitive","true")` will resolve this issue.

Review Comment:
   `spark.sql.caseSensitive` configuration is actually discouraged. I think this is actually a general issue in pandas API on Spark itself instead of this specific API. Should probably write down somewhere like https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/faq.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org