You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/20 09:21:53 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34957: [SPARK-37668][PYTHON] 'Index' object has no attribute 'levels' in pyspark.pandas.frame.DataFrame.insert

HyukjinKwon commented on a change in pull request #34957:
URL: https://github.com/apache/spark/pull/34957#discussion_r772200810



##########
File path: python/pyspark/pandas/frame.py
##########
@@ -3989,9 +3989,14 @@ def insert(
             )
 
         if is_name_like_tuple(column):
-            if len(column) != len(self.columns.levels):
-                # To be consistent with pandas
-                raise ValueError('"column" must have length equal to number of column levels.')
+            if isinstance(self.columns, pd.MultiIndex):

Review comment:
       Actually, I think it does support:
   
   ```python
   >>> import pandas as pd
   >>> pdf = pd.DataFrame({'a': [1]})
   >>> pdf.columns = pd.MultiIndex.from_tuples([("a", "b")])
   >>> import pyspark.pandas as ps
   >>> psdf = ps.from_pandas(pdf)
   >>> psdf.insert(0, ("b", "c"), [1])
   >>> psdf
      b  a
      c  b
   0  1  1
   ```

##########
File path: python/pyspark/pandas/frame.py
##########
@@ -3989,9 +3989,14 @@ def insert(
             )
 
         if is_name_like_tuple(column):
-            if len(column) != len(self.columns.levels):
-                # To be consistent with pandas
-                raise ValueError('"column" must have length equal to number of column levels.')
+            if isinstance(self.columns, pd.MultiIndex):
+                if len(column) != len(self.columns.levels):
+                    # To be consistent with pandas
+                    raise ValueError('"column" must have length equal to number of column levels.')
+            else:
+                raise NotImplementedError(
+                    "Tuple-like name is not supported to non-MultiIndex column"
+                )
 
         if column in self.columns:
             raise ValueError("cannot insert %s, already exists" % column)

Review comment:
       Let's explicitly call `str(column)` here. Otherwise, it will fail to format when `column` is a `tuple`. e.g.)
   
   ```
       raise ValueError("cannot insert %s, already exists" % column)
   TypeError: not all arguments converted during string formatting
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org