You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Alessandro Baretta <al...@gmail.com> on 2015/01/01 02:48:59 UTC

Spark driver main thread hanging after SQL insert

Here's what the console shows:

15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
whose tasks have all completed, from pool
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
ParquetTableOperations.scala:326) finished in 5493.549 s
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at
ParquetTableOperations.scala:326, took 5493.747061 s

It is now 01:40:03, so the driver has been hanging for the last 28 minutes.
The web UI on the other hand shows that all tasks completed successfully,
and the output directory has been populated--although the _SUCCESS file is
missing.

It is worth noting that my code started this job as its own thread. The
actual code looks like the following snippet, modulo some simplifications.

  def save_to_parquet(allowExisting : Boolean = false) = {
    val threads = tables.map(table => {
      val thread = new Thread {
        override def run {
          table.insertInto(t.table_name)
        }
      }
      thread.start
      thread
    })
    threads.foreach(_.join)
  }

As far as I can see the insertInto call never returns. Any idea why?

Alex

Re: Spark driver main thread hanging after SQL insert

Posted by Alessandro Baretta <al...@gmail.com>.
Patrick,

Sure. I was interested in knowing if anyone experienced a similar issue and
whether there was any known workaround. Anyway will report on JIRA.

Alex
On Jan 2, 2015 9:13 AM, "Patrick Wendell" <pw...@gmail.com> wrote:

> Hi Alessandro,
>
> Can you create a JIRA for this rather than reporting it on the dev
> list? That's where we track issues like this. Thanks!.
>
> - Patrick
>
> On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta
> <al...@gmail.com> wrote:
> > Here's what the console shows:
> >
> > 15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
> > whose tasks have all completed, from pool
> > 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
> > ParquetTableOperations.scala:326) finished in 5493.549 s
> > 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at
> > ParquetTableOperations.scala:326, took 5493.747061 s
> >
> > It is now 01:40:03, so the driver has been hanging for the last 28
> minutes.
> > The web UI on the other hand shows that all tasks completed successfully,
> > and the output directory has been populated--although the _SUCCESS file
> is
> > missing.
> >
> > It is worth noting that my code started this job as its own thread. The
> > actual code looks like the following snippet, modulo some
> simplifications.
> >
> >   def save_to_parquet(allowExisting : Boolean = false) = {
> >     val threads = tables.map(table => {
> >       val thread = new Thread {
> >         override def run {
> >           table.insertInto(t.table_name)
> >         }
> >       }
> >       thread.start
> >       thread
> >     })
> >     threads.foreach(_.join)
> >   }
> >
> > As far as I can see the insertInto call never returns. Any idea why?
> >
> > Alex
>

Re: Spark driver main thread hanging after SQL insert

Posted by Patrick Wendell <pw...@gmail.com>.
Hi Alessandro,

Can you create a JIRA for this rather than reporting it on the dev
list? That's where we track issues like this. Thanks!.

- Patrick

On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta
<al...@gmail.com> wrote:
> Here's what the console shows:
>
> 15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0,
> whose tasks have all completed, from pool
> 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at
> ParquetTableOperations.scala:326) finished in 5493.549 s
> 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at
> ParquetTableOperations.scala:326, took 5493.747061 s
>
> It is now 01:40:03, so the driver has been hanging for the last 28 minutes.
> The web UI on the other hand shows that all tasks completed successfully,
> and the output directory has been populated--although the _SUCCESS file is
> missing.
>
> It is worth noting that my code started this job as its own thread. The
> actual code looks like the following snippet, modulo some simplifications.
>
>   def save_to_parquet(allowExisting : Boolean = false) = {
>     val threads = tables.map(table => {
>       val thread = new Thread {
>         override def run {
>           table.insertInto(t.table_name)
>         }
>       }
>       thread.start
>       thread
>     })
>     threads.foreach(_.join)
>   }
>
> As far as I can see the insertInto call never returns. Any idea why?
>
> Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org