You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Danny Guinther (Jira)" <ji...@apache.org> on 2022/04/05 21:07:00 UTC

[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

    [ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517704#comment-17517704 ] 

Danny Guinther commented on SPARK-38792:
----------------------------------------

I don't know if it is helpful, but the runtime environment that the application is running in is a hosted Databricks workspace running in Azure.

I have tried deploying the upgrade to 3.2.1 several times in the last month and it behaves this way every time, so this is not just a fluke of bad timing.

> Regression in time executor takes to do work since v3.0.1 ?
> -----------------------------------------------------------
>
>                 Key: SPARK-38792
>                 URL: https://issues.apache.org/jira/browse/SPARK-38792
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Danny Guinther
>            Priority: Major
>         Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I don't believe it is specific to my application since the upgrade to 3.0.1 to 3.2.1 is purely a configuration change. I'd guess it presents itself in my application due to the high volume of work my application does, but I could be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly appear to take a lot longer on Spark 3.2.1. I don't have any ability to test versions between 3.0.1 and 3.2.1 because my application was previously blocked from upgrading beyond Spark 3.0.1 by https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint the problem? I've tried a bunch of the suggestions from [https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, but none of the adjustments I've tried have been fruitful. I also tried to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as to what might have changed to cause this behavior, but haven't seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by executor actions. In the image the blue and purple lines are different kinds of reads using the built-in JDBC data reader and the green line is writes using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 9AM on the graph. The graph data comes from timing blocks that surround only the calls to dataframe actions, so there shouldn't be anything specific to my application that is suddenly inflating these numbers.
> The driver process does seem to be seeing more GC churn then with Spark 3.0.1, but I don't think that explains this behavior. The executors don't seem to have any problem with memory or GC and are not overutilized (our pipeline is very read and write heavy, less heavy on transformations, so executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org