You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Micah Kornfield <em...@gmail.com> on 2022/07/05 19:17:28 UTC

Re: Write to big query via bq connector is slow

Hi Abhinav,
This question is probably best asked for the Spark BigQuery connector
project you linked.  It is not part of the core Spark implementation.

Cheers,
Micah


On Tue, Jun 28, 2022 at 12:12 AM Abhinav Ranjan <ab...@gmail.com>
wrote:

> Hi,
>
> I am trying to write to a big query table via the bq-spark-connector using
> the below 2 methods:
> Ref Page: https://github.com/GoogleCloudDataproc/spark-bigquery-connector
> <https://github.com/GoogleCloudDataproc/spark-bigquery-connector>
>
> 1. ==>  df.write.format("bigquery").save("dataset.table")
>
> 2. ==> df.write.format("bigquery").option("writeMethod",
> "direct").save("dataset.table")
>
> execution time of both the above methods are kind of the same.
> however when I try to create the table directly in the gcp console it runs
> in a fraction of the time taken by the above 2 methods.
>
> Are there some optimizations possible to create the dataset in BQ via
> spark which may be comparable to the native BQ experience?
>
> Thanks & Regards,
> Abhinav Ranjan
>