You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Haviv <da...@veracity-group.com> on 2015/10/12 11:52:37 UTC

SQLContext within foreachRDD

Hi,
As things that run inside foreachRDD run at the driver, does that mean that
if we use SQLContext inside foreachRDD the data is sent back to the driver
and only then the query is executed or is it executed at the executors?


Thank you.
Daniel

Re: SQLContext within foreachRDD

Posted by Daniel Haviv <da...@veracity-group.com>.
Just wanted to make sure.

Thanks.
Daniel

On Mon, Oct 12, 2015 at 1:07 PM, Adrian Tanase <at...@adobe.com> wrote:

> Not really, unless you’re doing something wrong (e.g. Call collect or
> similar).
>
> In the foreach loop you’re typically registering a temp table, by
> converting an RDD to data frame. All the subsequent queries are executed in
> parallel on the workers.
>
> I haven’t built production apps with this pattern but I have successfully
> built a prototype where I execute dynamic SQL on top of a 15 minute window
> (obtained with .window on the Dstream) - and it works as expected.
>
> Check this out for code example:
> https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala
>
> -adrian
>
> From: Daniel Haviv
> Date: Monday, October 12, 2015 at 12:52 PM
> To: user
> Subject: SQLContext within foreachRDD
>
> Hi,
> As things that run inside foreachRDD run at the driver, does that mean
> that if we use SQLContext inside foreachRDD the data is sent back to the
> driver and only then the query is executed or is it executed at the
> executors?
>
>
> Thank you.
> Daniel
>
>
>

Re: SQLContext within foreachRDD

Posted by Adrian Tanase <at...@adobe.com>.
Not really, unless you’re doing something wrong (e.g. Call collect or similar).

In the foreach loop you’re typically registering a temp table, by converting an RDD to data frame. All the subsequent queries are executed in parallel on the workers.

I haven’t built production apps with this pattern but I have successfully built a prototype where I execute dynamic SQL on top of a 15 minute window (obtained with .window on the Dstream) - and it works as expected.

Check this out for code example: https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala

-adrian

From: Daniel Haviv
Date: Monday, October 12, 2015 at 12:52 PM
To: user
Subject: SQLContext within foreachRDD

Hi,
As things that run inside foreachRDD run at the driver, does that mean that if we use SQLContext inside foreachRDD the data is sent back to the driver and only then the query is executed or is it executed at the executors?


Thank you.
Daniel