You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxwell Conradt (Jira)" <ji...@apache.org> on 2022/08/22 11:51:00 UTC
[jira] [Created] (SPARK-40178) Rebalance/Repartition Hints Not Working in PySpark
Maxwell Conradt created SPARK-40178:
---------------------------------------
Summary: Rebalance/Repartition Hints Not Working in PySpark
Key: SPARK-40178
URL: https://issues.apache.org/jira/browse/SPARK-40178
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.2.2, 3.3.0, 3.2.1, 3.2.0
Environment: Mac OSX 11.4 Big Sur
Python 3.9.7
Spark version >= 3.2.0 (perhaps before as well).
Reporter: Maxwell Conradt
Fix For: 3.4.0, 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.2.0
Partitioning hints in PySpark do not work because the column parameters are not converted to Catalyst `Expression` instances before being passed to the hint resolver.
The behavior of the hints is documented [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types].
Example:
{code:java}
>>> df = spark.range(1024)
>>>
>>> df
DataFrame[id: bigint]
>>> df.hint("rebalance", "id")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
jdf = self._jdf.hint(name, self._jseq(parameters))
File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include columns, but id found
>>> df.hint("repartition", "id")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
jdf = self._jdf.hint(name, self._jseq(parameters))
File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should include columns, but id found {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org