You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/07/05 02:50:00 UTC

[jira] [Commented] (SPARK-39603) Dataset planning in a unit test takes a very long time to finish (e.g. >8mins for complex job)

    [ https://issues.apache.org/jira/browse/SPARK-39603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562327#comment-17562327 ] 

Hyukjin Kwon commented on SPARK-39603:
--------------------------------------

Mind showing the reproducer? It's very difficult to assess the problem with just text here.

> Dataset planning in a unit test takes a very long time to finish (e.g. >8mins for complex job)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39603
>                 URL: https://issues.apache.org/jira/browse/SPARK-39603
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Tanin Na Nakorn
>            Priority: Major
>
> At Stripe, we have a very complex data job. The unit test was running fine when we used RDD.
> After we switched to Dataset, the unit test takes considerably longer (e.g. > 8 mins just for planning).
> Most of our unit tests only process 1-2 records.
> We have tried to investigate it a bit, and we are somewhat sure it's the planning phrase.
> We tried disabling almost all optimizers except the ~10 optimizers that can't be disabled. It doesn't impact the test run time at all.
> Is there a way to make dataset plan faster in unit test.
> Thank you!
> (Please excuse us. I may use inaccurate term.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org