You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/08/01 07:13:00 UTC

[jira] [Commented] (SPARK-28575) Spark job time increasing when upgrading Spark from 2.1.1 to 2.3.1

    [ https://issues.apache.org/jira/browse/SPARK-28575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897826#comment-16897826 ] 

Hyukjin Kwon commented on SPARK-28575:
--------------------------------------

Please show the reproducer (self-contained if possible) and fix the title / description properly to describe the issue. Otherwise, no one knows what's the issue.

> Spark job time increasing when upgrading Spark from 2.1.1 to 2.3.1
> ------------------------------------------------------------------
>
>                 Key: SPARK-28575
>                 URL: https://issues.apache.org/jira/browse/SPARK-28575
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.3.1
>            Reporter: Kushal Mahajan
>            Priority: Major
>
> I am running a spark job using standalone cluster with Spark 2.1.1. The standalone cluster was upgraded from 2.1.1 to Spark 2.3.1. There was considerable drop in performance(~3-4 times) in the spark job. Upon investigation, I found out that there is considerable time lag(ranging from 30 sec to 2 min) between start time of different spark actions(excluding the time taken by the action itself).(as can be seen from start time of each job in Spark UI page). This was not there in Spark 2.1.1. Can anybody tell what is the issue here?
> PS: I am reading multiple text files from S3 using wholeTextFile, creating multiple dataframes for thos textfiles and writing them out to S3 in csv format.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org