You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/04/05 03:23:00 UTC

[jira] [Updated] (HUDI-3724) Too many open files w/ COW spark long running tests

     [ https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Xu updated HUDI-3724:
-----------------------------
    Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Mar-23  (was: Hudi-Sprint-Mar-22)

> Too many open files w/ COW spark long running tests
> ---------------------------------------------------
>
>                 Key: HUDI-3724
>                 URL: https://issues.apache.org/jira/browse/HUDI-3724
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> We run integ tests against hudi and recently our spark long running tests are failing for COW table with "too many open files". May be we have some leaks and need to chase them and close it out. 
> {code:java}
> 	... 6 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor driver): java.io.FileNotFoundException: /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3 (Too many open files)
> 	at java.io.FileOutputStream.open0(Native Method)
> 	at java.io.FileOutputStream.open(FileOutputStream.java:270)
> 	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279)
> 	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171)
> 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:131)
> 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)