You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxwell Conradt (Jira)" <ji...@apache.org> on 2022/08/22 11:51:00 UTC

[jira] [Created] (SPARK-40178) Rebalance/Repartition Hints Not Working in PySpark

Maxwell Conradt created SPARK-40178:
---------------------------------------

             Summary: Rebalance/Repartition Hints Not Working in PySpark
                 Key: SPARK-40178
                 URL: https://issues.apache.org/jira/browse/SPARK-40178
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.2.2, 3.3.0, 3.2.1, 3.2.0
         Environment: Mac OSX 11.4 Big Sur

Python 3.9.7

Spark version >= 3.2.0 (perhaps before as well).
            Reporter: Maxwell Conradt
             Fix For: 3.4.0, 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.2.0


Partitioning hints in PySpark do not work because the column parameters are not converted to Catalyst `Expression` instances before being passed to the hint resolver.

The behavior of the hints is documented [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types].

Example:

 
{code:java}
>>> df = spark.range(1024)
>>> 
>>> df
DataFrame[id: bigint]
>>> df.hint("rebalance", "id")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
    jdf = self._jdf.hint(name, self._jseq(parameters))
  File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include columns, but id found
>>> df.hint("repartition", "id")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint
    jdf = self._jdf.hint(name, self._jseq(parameters))
  File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should include columns, but id found {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org