You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bjørn Jørgensen (Jira)" <ji...@apache.org> on 2022/04/21 17:57:00 UTC
[jira] (SPARK-37174) WARN WindowExec: No Partition Defined is being printed 4 times.
[ https://issues.apache.org/jira/browse/SPARK-37174 ]
Bjørn Jørgensen deleted comment on SPARK-37174:
-----------------------------------------
was (Author: bjornjorgensen):
I add a file now with the info msg I get when I run df.info()
Spark master build from last week.
I will raise this to a bug for spark 3.3.
df.shape
(763300, 224)
> WARN WindowExec: No Partition Defined is being printed 4 times.
> ----------------------------------------------------------------
>
> Key: SPARK-37174
> URL: https://issues.apache.org/jira/browse/SPARK-37174
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.3.0
> Reporter: Bjørn Jørgensen
> Priority: Major
> Attachments: info.txt
>
>
> Hi I use this code
> {code:java}
> f01 = spark.read.json("/home/test_files/falk/flatted110721/F01.json/*.json")
> pf01 = f01.to_pandas_on_spark()
> pf01 = pf01.rename(columns=lambda x: re.sub(':P$', '', x))
> pf01["OBJECT_CONTRACT:DATE_PUBLICATION_NOTICE"] = ps.to_datetime(pf01["OBJECT_CONTRACT:DATE_PUBLICATION_NOTICE"])
> pf01.info(){code}
>
> sometimes it prints
>
> {code:java}
> 21/10/31 20:38:04 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
> 21/10/31 20:38:04 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
> 21/10/31 20:38:08 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
> /opt/spark/python/pyspark/sql/pandas/conversion.py:214: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
> df[column_name] = series
> /opt/spark/python/pyspark/pandas/utils.py:967: UserWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
> warnings.warn(message, UserWarning)
> 21/10/31 20:38:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
> 21/10/31 20:38:18 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.{code}
>
> and some other times it "just" prints
>
> {code:java}
> 21/10/31 21:24:13 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
> 21/10/31 21:24:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
> 21/10/31 21:24:22 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
> 21/10/31 21:24:24 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.{code}
> Why does it print df[column_name] = series ?
>
> can we remove /opt/spark/python/pyspark/pandas/utils.py:967: ?
> and warnings.warn(message, UserWarning) ?
> and 3 of WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.?
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org