You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by ram kumar <ra...@gmail.com> on 2016/09/24 14:08:48 UTC

Guys, How to use JDBC connection in Flink using Scala

Hi Team,

I am wondering is that possible to add JDBC connection or url  as a source
or target in Flink using Scala.
Could you kindly some one help me on this? if you have any sample code
please share it here.


*Thanks*
*Ram*

Re: Guys, How to use JDBC connection in Flink using Scala

Posted by Sameer W <sa...@axiomine.com>.

Using plain JDBC on Redshift will be slow for any reasonable volume but if
you need to do that, you can open a connection to it from a RichFunction
open() method-

I wrote a blog article a while back on Spark-Redshift package works -
https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html
.

This image captures the internal processes in Spark-Redshift for read -
https://databricks.com/wp-content/uploads/2015/10/image01.gif. and this one
captures the write -
https://databricks.com/wp-content/uploads/2015/10/image00.gif

In your case you can read the Kafka sources, partition the data
appropriately (based on Redshift) and write the partitions to an S3 bucket
and then invoke the COPY command in Redshift to load the data from the S3
bucket. This is the exact same process written explicitly as the above
mentioned blog article.

Sameer

On Sat, Sep 24, 2016 at 12:32 PM, ram kumar <ra...@gmail.com> wrote:

> Many Thanks Felix.
>
> * Flink Use case :*
>
>
>
> Extract data from source *(Kafka*) and loading data into target (*AWS S3
> and Redshift)*.
>
> we use SCD2 in the Redshift…since data changes need to be captured in the
> redshift target.
>
>
>
> To connect redshift ( for staging and production database ) I need to
> setup JDBC connection in Flink Scala.
>
>
>
> *Kafka (Source) ------------>  Flink  (JDBC)  ----------->** AWS ( S3 and
> Redshift) Target.*
>
>
>
> Could you please suggest me the best approach for this use case.
>
>
> Regards
>
> Ram.
>
>
>
> On 24 September 2016 at 16:14, Felix Dreissig <f3...@f30.me> wrote:
>
>> Hi Ram,
>>
>> On 24 Sep 2016, at 16:08, ram kumar <ra...@gmail.com> wrote:
>> > I am wondering is that possible to add JDBC connection or url  as a
>> source or target in Flink using Scala.
>> > Could you kindly some one help me on this? if you have any sample code
>> please share it here.
>>
>> What’s your intended use case? Getting changes from a database or REST
>> API into a data stream for processing in Flink?
>>
>> If so, you could use a data capture tool to write your changes to Kafka
>> and then let Flink receive them from there. There are e.g. Bottled Water
>> [1] for Postgres and Maxwell [2] and Debezium [3] for MySQL.
>> For REST, I suppose you’d have to periodically query the API and
>> determine changes yourself. I don’t know if there are any tools to help you
>> with that.
>>
>> Regards,
>> Felix
>>
>> [1] https://github.com/confluentinc/bottledwater-pg
>> [2] http://maxwells-daemon.io/
>> [3] http://debezium.io/
>
>
>

Re: Guys, How to use JDBC connection in Flink using Scala

Posted by Felix Dreissig <f3...@f30.me>.

Hi Ram,

On 24 Sep 2016, at 18:32, ram kumar <ra...@gmail.com> wrote:
> To connect redshift ( for staging and production database ) I need to setup JDBC connection in Flink Scala.
> 
>  
> 
> Kafka (Source) ------------>  Flink  (JDBC)  -----------> AWS ( S3 and Redshift) Target.
> 
>  
> Could you please suggest me the best approach for this use case.

If you get your stream from somewhere else (i.e. Kafka) and the database (i.e. Redshift) is your target only, forget what I told in my previous mail.
You could just write a DataSink that uses JDBC and connects to your target, but I don’t have any tips or even sample code for that.

Regards,
Felix

Re: Guys, How to use JDBC connection in Flink using Scala

Posted by ram kumar <ra...@gmail.com>.

Many Thanks Felix.

* Flink Use case :*



Extract data from source *(Kafka*) and loading data into target (*AWS S3
and Redshift)*.

we use SCD2 in the Redshift…since data changes need to be captured in the
redshift target.



To connect redshift ( for staging and production database ) I need to setup
JDBC connection in Flink Scala.



*Kafka (Source) ------------>  Flink  (JDBC)  ----------->** AWS ( S3 and
Redshift) Target.*



Could you please suggest me the best approach for this use case.


Regards

Ram.



On 24 September 2016 at 16:14, Felix Dreissig <f3...@f30.me> wrote:

> Hi Ram,
>
> On 24 Sep 2016, at 16:08, ram kumar <ra...@gmail.com> wrote:
> > I am wondering is that possible to add JDBC connection or url  as a
> source or target in Flink using Scala.
> > Could you kindly some one help me on this? if you have any sample code
> please share it here.
>
> What’s your intended use case? Getting changes from a database or REST API
> into a data stream for processing in Flink?
>
> If so, you could use a data capture tool to write your changes to Kafka
> and then let Flink receive them from there. There are e.g. Bottled Water
> [1] for Postgres and Maxwell [2] and Debezium [3] for MySQL.
> For REST, I suppose you’d have to periodically query the API and determine
> changes yourself. I don’t know if there are any tools to help you with that.
>
> Regards,
> Felix
>
> [1] https://github.com/confluentinc/bottledwater-pg
> [2] http://maxwells-daemon.io/
> [3] http://debezium.io/

Re: Guys, How to use JDBC connection in Flink using Scala

Posted by Felix Dreissig <f3...@f30.me>.

Hi Ram,

On 24 Sep 2016, at 16:08, ram kumar <ra...@gmail.com> wrote:
> I am wondering is that possible to add JDBC connection or url  as a source or target in Flink using Scala.
> Could you kindly some one help me on this? if you have any sample code please share it here.

What’s your intended use case? Getting changes from a database or REST API into a data stream for processing in Flink?

If so, you could use a data capture tool to write your changes to Kafka and then let Flink receive them from there. There are e.g. Bottled Water [1] for Postgres and Maxwell [2] and Debezium [3] for MySQL.
For REST, I suppose you’d have to periodically query the API and determine changes yourself. I don’t know if there are any tools to help you with that.

Regards,
Felix

[1] https://github.com/confluentinc/bottledwater-pg
[2] http://maxwells-daemon.io/
[3] http://debezium.io/