You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2015/09/15 23:24:46 UTC

[jira] [Resolved] (SPARK-5060) Spark driver main thread hanging after SQL insert in Parquet file

     [ https://issues.apache.org/jira/browse/SPARK-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust resolved SPARK-5060.
-------------------------------------
    Resolution: Cannot Reproduce

This code has changed a lot in Spark 1.5, so I'm going to close this ticket.  Please reopen if you can still reproduce.

> Spark driver main thread hanging after SQL insert in Parquet file
> -----------------------------------------------------------------
>
>                 Key: SPARK-5060
>                 URL: https://issues.apache.org/jira/browse/SPARK-5060
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Alex Baretta
>
> Here's what the console shows:
> 15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0, whose tasks have all completed, from pool
> 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at ParquetTableOperations.scala:326) finished in 5493.549 s
> 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at ParquetTableOperations.scala:326, took 5493.747061 s
> It is now 01:40:03, so the driver has been hanging for the last 28 minutes. The web UI on the other hand shows that all tasks completed successfully, and the output directory has been populated--although the _SUCCESS file is missing.
> It is worth noting that my code started this job as its own thread. The actual code looks like the following snippet, modulo some simplifications.
>   def save_to_parquet(allowExisting : Boolean = false) = {
>     val threads = tables.map(table => {
>       val thread = new Thread {
>         override def run {
>           table.insertInto(t.table_name)
>         }
>       }
>       thread.start
>       thread
>     })
>     threads.foreach(_.join)
>   }
> As far as I can see the insertInto call never returns.
> The version of Spark I'm using is built from master, off of this commit:
> commit 815de54002f9c1cfedc398e95896fa207b4a5305
> Author: YanTangZhai <ha...@tencent.com>
> Date:   Mon Dec 29 11:30:54 2014 -0800
>     [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org