You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2021/05/24 17:40:22 UTC

Re: About Spark executs sqlscript

It's not possible to load everything into memory. We should use a big query
connector (should be existing already?) and register table B and C and temp
views in Spark.

On Fri, May 14, 2021 at 8:50 AM bo zhao <zh...@gmail.com> wrote:

> Hi Team,
>
> I've followed Spark community for several years. This is my first time for
> asking help. I hope you guys can give some experience.
>
> I want to develop a spark application with processing a sqlscript file.
> The data is on BigQuery.
> For example, the sqlscript is:
>
> delete from tableA;
> insert into tableA select b.columnB1, c.columnC2 from tableB b, tableC c;
>
>
> I can parse this file. In my opinion, After parsing the file, steps should
> follow these below:
>
> #step1: read tableB, tableC into memory(Spark)
> #step2. register views for tableB's dataframe and tableC's dataframe
> #step3. use spark.sql("select b.columnB1, c.columnC2 from tableB b, tableC
> c") to get a new dataframe
> #step4. new dataframe.write().() to tableA using mode of "OVERWRITE"
>
> My question:
> #1 If there are 10 tables or more tables, do I need to read each table
> into memory though Spark bases on memory compution?
> #2 Is there a much easier way to deal with my scenarios, for example, I
> just define the datasource(BigQuery) and just parse sqlscript file, others
> are run by Spark.
>
> Please share your experience or idea.
>