You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/08/08 04:52:19 UTC
[GitHub] [spark] itholic commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`
itholic commented on code in PR #42369:
URL: https://github.com/apache/spark/pull/42369#discussion_r1286592428
##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame":
to.__doc__ = PySparkDataFrame.to.__doc__
def toDF(self, *cols: str) -> "DataFrame":
+ expected_len_cols = len(self.columns)
+ actual_len_cols = len(cols)
+ if expected_len_cols != actual_len_cols:
Review Comment:
Yeah, the existing error is raised from JVM and captured from PySpark as below:
```python
Traceback (most recent call last):
File "/.../spark/python/pyspark/sql/tests/test_dataframe.py", line 1028, in test_toDF_with_string
df.toDF("key")
File "/.../spark/python/pyspark/sql/dataframe.py", line 5324, in toDF
jdf = self._jdf.toDF(self._jseq(cols))
File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 185, in deco
raise converted from None
pyspark.errors.exceptions.captured.IllegalArgumentException: requirement failed: The number of columns doesn't match.
Old column names (2): _1, _2
New column names (1): key
JVM stacktrace:
java.lang.IllegalArgumentException: requirement failed: The number of columns doesn't match.
Old column names (2): _1, _2
New column names (1): key
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.Dataset.toDF(Dataset.scala:534)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
```
So I wanted to make it short, and capture by `PySparkValueError` as below:
```python
Traceback (most recent call last):
File "/.../spark/python/pyspark/sql/tests/test_dataframe.py", line 1028, in test_toDF_with_string
df.toDF("key")
File "/.../spark/python/pyspark/sql/dataframe.py", line 5310, in toDF
raise PySparkValueError(
pyspark.errors.exceptions.base.PySparkValueError: [LENGTH_MISMATCH] The length of `cols` must be 2, got 1.
```
Do we want to just keep the current behavior and revert the changes ??
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org