You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Niels Basjes <Ni...@basjes.nl> on 2019/05/31 08:14:57 UTC

BigQuery source ?

Hi,

Has anyone created a source to READ from BigQuery into Flink yet (we
have Flink running on K8S in the Google cloud)?
I would like to retrieve a DataSet in a distributed way (the data ...
it's kinda big) and process that with Flink running on k8s (which we
have running already).

So far I have not been able to find anything yet.
Any pointers/hints/code fragments are welcome.

Thanks

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: BigQuery source ?

Posted by Richard Deurwaarder <ri...@xeli.eu>.
I've looked into this briefly a while ago out of interest and read about
how beam handles this. I've never actually implemented but the concept
sounds reasonable to me.

What I read from their code is that beam exports the BigQuery data to
Google Storage. This export shards the data in files with a max size of 1GB
and these files are then processed by the 'source functions' in beam.

I think implementing this in Flink would require the following:

* Before starting the Flink job run the BigQuery to Google Storage Export (
https://cloud.google.com/bigquery/docs/exporting-data)
* Start the flink job and point towards the Google storage files (using
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage to
easily read from Google Storage buckets)

So the job might look something like this:

> List<String> files = doBigQueryExportJob();
> DataSet<String> records = environment.fromCollection(files)
>                 .flatMap(new ReadFromFile())
>                 .map(doWork());


On Fri, May 31, 2019 at 10:15 AM Niels Basjes <Ni...@basjes.nl> wrote:

> Hi,
>
> Has anyone created a source to READ from BigQuery into Flink yet (we
> have Flink running on K8S in the Google cloud)?
> I would like to retrieve a DataSet in a distributed way (the data ...
> it's kinda big) and process that with Flink running on k8s (which we
> have running already).
>
> So far I have not been able to find anything yet.
> Any pointers/hints/code fragments are welcome.
>
> Thanks
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>