You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bjørn Jørgensen (Jira)" <ji...@apache.org> on 2022/04/21 17:56:00 UTC
[jira] [Commented] (SPARK-37174) WARN WindowExec: No Partition Defined is being printed 4 times.

    [ https://issues.apache.org/jira/browse/SPARK-37174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525911#comment-17525911 ] 

Bjørn Jørgensen commented on SPARK-37174:
-----------------------------------------

I add a file now with the info msg I get when I run df.info()

Spark master build from last week. 

I will raise this to a bug for spark 3.3. 

df.shape
(763300, 224) 



> WARN WindowExec: No Partition Defined is being printed 4 times. 
> ----------------------------------------------------------------
>
>                 Key: SPARK-37174
>                 URL: https://issues.apache.org/jira/browse/SPARK-37174
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Bjørn Jørgensen
>            Priority: Major
>         Attachments: info.txt
>
>
> Hi I use this code  
> {code:java}
> f01 = spark.read.json("/home/test_files/falk/flatted110721/F01.json/*.json")
> pf01 = f01.to_pandas_on_spark()
> pf01 = pf01.rename(columns=lambda x: re.sub(':P$', '', x))
> pf01["OBJECT_CONTRACT:DATE_PUBLICATION_NOTICE"] = ps.to_datetime(pf01["OBJECT_CONTRACT:DATE_PUBLICATION_NOTICE"])
> pf01.info(){code}
>  
>  sometimes it prints 
>   
> {code:java}
>  21/10/31 20:38:04 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
>  21/10/31 20:38:04 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
>  21/10/31 20:38:08 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
>  /opt/spark/python/pyspark/sql/pandas/conversion.py:214: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead.  To get a de-fragmented frame, use `newframe = frame.copy()`
>    df[column_name] = series
>  /opt/spark/python/pyspark/pandas/utils.py:967: UserWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
>    warnings.warn(message, UserWarning)
>  21/10/31 20:38:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
>  21/10/31 20:38:18 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.{code}
>  
>  and some other times it "just" prints 
>   
> {code:java}
>  21/10/31 21:24:13 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
>  21/10/31 21:24:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
>  21/10/31 21:24:22 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
>  21/10/31 21:24:24 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.{code}
> Why does it print df[column_name] = series ?
>   
>  can we remove /opt/spark/python/pyspark/pandas/utils.py:967: ?
>  and warnings.warn(message, UserWarning) ?
>  and 3 of WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.?
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org