You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Sathi Chowdhury <Sa...@elliemae.com> on 2017/03/02 06:21:40 UTC

Data stream to write to multiple rds instances

Hi All,
Is there any preferred way to manage multiple jdbc connections from flink..? I am new to flink and looking for some guidance around the right pattern and apis to do this. The usecase needs to route a stream to a particular jdbc connection depending on a field value.So the records are written to multiple destination dbs.
Thanks
Sathi
On 02/07/2017 04:12 PM, Robert Metzger wrote:
Currently, there is no streaming JDBC connector.
Check out this thread from last year: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Connector-td10508.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>

Sent from my iPhone

On Feb 8, 2017, at 1:49 AM, Punit Tandel <pu...@ericsson.com>> wrote:

Hi Chesnay

Currently that is what i have done, reading the schema from database in order to create a new table in jdbc database and writing the rows coming from jdbcinputformat.

Overall i am trying to implement the solution which reads the streaming data from one source which either could be coming from kafka, Jdbc, Hive, Hdfs and writing those streaming data to output source which is again could be any of those.

For a simple use case i have just taken one scenario using jdbc in and jdbc out, Since the jdbc input source returns the datastream of Row and to write them into jdbc database we have to create a table which requires schema.

Thanks
Punit

On 02/08/2017 08:22 AM, Chesnay Schepler wrote:
Hello,

I don't understand why you explicitly need the schema since the batch JDBCInput-/Outputformats don't require it.
That's kind of the nice thing about Rows.

Would be cool if you could tell us what you're planning to do with the schema :)

In any case, to get the schema within the plan then you will have to query the DB and build it yourself. Note that this
is executed on the client.

Regards,
Chesnay

On 08.02.2017 00:39, Punit Tandel wrote:

Hi Robert

Thanks for the response, So in near future release of the flink version , is this functionality going to be implemented ?

Thanks

On 02/07/2017 04:12 PM, Robert Metzger wrote:
Currently, there is no streaming JDBC connector.
Check out this thread from last year: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Connector-td10508.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>

On Mon, Feb 6, 2017 at 5:00 PM, Ufuk Celebi <uc...@apache.org>> wrote:
I'm not sure how well this works for the streaming API. Looping in
Chesnay, who worked on this.

On Mon, Feb 6, 2017 at 11:09 AM, Punit Tandel <pu...@ericsson.com>> wrote:
> Hi ,
>
> I was looking into flink streaming api and trying to implement the solution
> for reading the data from jdbc database and writing them to jdbc databse
> again.
>
> At the moment i can see the datastream is returning Row from the database.
> dataStream.getType().getGenericParameters() retuning an empty list of
> collection.
>
> I am right now manually creating a database connection and getting the
> schema from ResultMetadata and constructing the schema for the table which
> is a bit heavy operation.
>
> So is there any other way to get the schema for the table in order to create
> a new table and write those records in the database ?
>
> Please let me know
>
> Thanks
> Punit

=============Notice to Recipient: This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain information that is confidential or legally privileged, and intended for the use of the individual or entity named above. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender by telephone or return e-mail and delete the original transmission and its attachments without reading or saving in any manner. Thank you. =============

Re: Data stream to write to multiple rds instances

Posted by Till Rohrmann <tr...@apache.org>.

Hi Sathi,

if you read data from Kinesis than Flink can offer you exactly once
processing guarantees. However, what you see written out to your database
depends a little bit on the implementation of your custom sink. If you have
synchronous JDBC client which does not lose data and you fail your job
whenever you see an error, then you should achieve at least once.

Cheers,
Till

On Thu, Mar 2, 2017 at 4:49 PM, Sathi Chowdhury <
Sathi.Chowdhury@elliemae.com> wrote:

> Hi Till,
> Thanks for your reply.I guess I will have to write a custom sink function
> that will use JdbcOutputFormat. I have a question about checkpointing
> support though ..if I  am reading a stream from kinesis , streamA and it is
> transformed to streamB, and that is written to db, as streamB is
> checkpointed when program recovers will it start from the streamB's
> Checkpointed offset ? In that case checkpointing the jdbc side is not so
> important maybe ..
> Thanks
> Sathi
>
>
> On Mar 2, 2017, at 5:58 AM, Till Rohrmann <tr...@apache.org> wrote:
>
> Hi Sathi,
>
> you can split select or filter your data stream based on the field's
> value. Then you are able to obtain multiple data streams which you can
> output using a JDBCOutputFormat for each data stream. Be aware, however,
> that the JDBCOutputFormat does not give you any processing guarantees since
> it does not take part in Flink's checkpointing mechanism. Unfortunately,
> Flink does not have a streaming JDBC connector, yet.
>
> Cheers,
> Till
>
> On Thu, Mar 2, 2017 at 7:21 AM, Sathi Chowdhury <
> Sathi.Chowdhury@elliemae.com> wrote:
>
>> Hi All,
>> Is there any preferred way to manage multiple jdbc connections from
>> flink..? I am new to flink and looking for some guidance around the right
>> pattern and apis to do this. The usecase needs to route a stream to a
>> particular jdbc connection depending on a field value.So the records are
>> written to multiple destination dbs.
>> Thanks
>> Sathi
>>
>> On 02/07/2017 04:12 PM, Robert Metzger wrote:
>>
>> Currently, there is no streaming JDBC connector.
>> Check out this thread from last year: http://apache-flink-mail
>> ing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Conn
>> ector-td10508.html
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>
>>
>> Sent from my iPhone
>>
>> On Feb 8, 2017, at 1:49 AM, Punit Tandel <pu...@ericsson.com>
>> wrote:
>>
>> Hi Chesnay
>>
>> Currently that is what i have done, reading the schema from database in
>> order to create a new table in jdbc database and writing the rows coming
>> from jdbcinputformat.
>>
>> Overall i am trying to implement the solution which reads the streaming
>> data from one source which either could be coming from kafka, Jdbc, Hive,
>> Hdfs and writing those streaming data to output source which is again could
>> be any of those.
>>
>> For a simple use case i have just taken one scenario using jdbc in and
>> jdbc out, Since the jdbc input source returns the datastream of Row and to
>> write them into jdbc database we have to create a table which requires
>> schema.
>>
>> Thanks
>> Punit
>>
>>
>>
>> On 02/08/2017 08:22 AM, Chesnay Schepler wrote:
>>
>> Hello,
>>
>> I don't understand why you explicitly need the schema since the batch
>> JDBCInput-/Outputformats don't require it.
>> That's kind of the nice thing about Rows.
>>
>> Would be cool if you could tell us what you're planning to do with the
>> schema :)
>>
>> In any case, to get the schema within the plan then you will have to
>> query the DB and build it yourself. Note that this
>> is executed on the client.
>>
>> Regards,
>> Chesnay
>>
>> On 08.02.2017 00:39, Punit Tandel wrote:
>>
>> Hi Robert
>>
>> Thanks for the response, So in near future release of the flink version ,
>> is this functionality going to be implemented ?
>>
>> Thanks
>> On 02/07/2017 04:12 PM, Robert Metzger wrote:
>>
>> Currently, there is no streaming JDBC connector.
>> Check out this thread from last year: http://apache-flink-mail
>> ing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Conn
>> ector-td10508.html
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>
>>
>>
>>
>> On Mon, Feb 6, 2017 at 5:00 PM, Ufuk Celebi <uc...@apache.org> wrote:
>>
>>> I'm not sure how well this works for the streaming API. Looping in
>>> Chesnay, who worked on this.
>>>
>>> On Mon, Feb 6, 2017 at 11:09 AM, Punit Tandel <pu...@ericsson.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I was looking into flink streaming api and trying to implement the
>>> solution
>>> > for reading the data from jdbc database and writing them to jdbc
>>> databse
>>> > again.
>>> >
>>> > At the moment i can see the datastream is returning Row from the
>>> database.
>>> > dataStream.getType().getGenericParameters() retuning an empty list of
>>> > collection.
>>> >
>>> > I am right now manually creating a database connection and getting the
>>> > schema from ResultMetadata and constructing the schema for the table
>>> which
>>> > is a bit heavy operation.
>>> >
>>> > So is there any other way to get the schema for the table in order to
>>> create
>>> > a new table and write those records in the database ?
>>> >
>>> > Please let me know
>>> >
>>> > Thanks
>>> > Punit
>>>
>>
>>
>>
>>
>> =============Notice to Recipient: This e-mail transmission, and any
>> documents, files or previous e-mail messages attached to it may contain
>> information that is confidential or legally privileged, and intended for
>> the use of the individual or entity named above. If you are not the
>> intended recipient, or a person responsible for delivering it to the
>> intended recipient, you are hereby notified that you must not read this
>> transmission and that any disclosure, copying, printing, distribution or
>> use of any of the information contained in or attached to this transmission
>> is STRICTLY PROHIBITED. If you have received this transmission in error,
>> please immediately notify the sender by telephone or return e-mail and
>> delete the original transmission and its attachments without reading or
>> saving in any manner. Thank you. =============
>>
>
> =============Notice to Recipient: This e-mail transmission, and any
> documents, files or previous e-mail messages attached to it may contain
> information that is confidential or legally privileged, and intended for
> the use of the individual or entity named above. If you are not the
> intended recipient, or a person responsible for delivering it to the
> intended recipient, you are hereby notified that you must not read this
> transmission and that any disclosure, copying, printing, distribution or
> use of any of the information contained in or attached to this transmission
> is STRICTLY PROHIBITED. If you have received this transmission in error,
> please immediately notify the sender by telephone or return e-mail and
> delete the original transmission and its attachments without reading or
> saving in any manner. Thank you. =============
>

Re: Data stream to write to multiple rds instances

Posted by Sathi Chowdhury <Sa...@elliemae.com>.

Hi Till,
Thanks for your reply.I guess I will have to write a custom sink function that will use JdbcOutputFormat. I have a question about checkpointing support though ..if I  am reading a stream from kinesis , streamA and it is transformed to streamB, and that is written to db, as streamB is checkpointed when program recovers will it start from the streamB's Checkpointed offset ? In that case checkpointing the jdbc side is not so important maybe ..
Thanks
Sathi

On Mar 2, 2017, at 5:58 AM, Till Rohrmann <tr...@apache.org>> wrote:

Hi Sathi,

you can split select or filter your data stream based on the field's value. Then you are able to obtain multiple data streams which you can output using a JDBCOutputFormat for each data stream. Be aware, however, that the JDBCOutputFormat does not give you any processing guarantees since it does not take part in Flink's checkpointing mechanism. Unfortunately, Flink does not have a streaming JDBC connector, yet.

Cheers,
Till

On Thu, Mar 2, 2017 at 7:21 AM, Sathi Chowdhury <Sa...@elliemae.com>> wrote:
Hi All,
Is there any preferred way to manage multiple jdbc connections from flink..? I am new to flink and looking for some guidance around the right pattern and apis to do this. The usecase needs to route a stream to a particular jdbc connection depending on a field value.So the records are written to multiple destination dbs.
Thanks
Sathi
On 02/07/2017 04:12 PM, Robert Metzger wrote:
Currently, there is no streaming JDBC connector.
Check out this thread from last year: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Connector-td10508.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>

Sent from my iPhone

On Feb 8, 2017, at 1:49 AM, Punit Tandel <pu...@ericsson.com>> wrote:

Hi Chesnay

Currently that is what i have done, reading the schema from database in order to create a new table in jdbc database and writing the rows coming from jdbcinputformat.

Overall i am trying to implement the solution which reads the streaming data from one source which either could be coming from kafka, Jdbc, Hive, Hdfs and writing those streaming data to output source which is again could be any of those.

For a simple use case i have just taken one scenario using jdbc in and jdbc out, Since the jdbc input source returns the datastream of Row and to write them into jdbc database we have to create a table which requires schema.

Thanks
Punit

On 02/08/2017 08:22 AM, Chesnay Schepler wrote:
Hello,

I don't understand why you explicitly need the schema since the batch JDBCInput-/Outputformats don't require it.
That's kind of the nice thing about Rows.

Would be cool if you could tell us what you're planning to do with the schema :)

In any case, to get the schema within the plan then you will have to query the DB and build it yourself. Note that this
is executed on the client.

Regards,
Chesnay

On 08.02.2017 00:39, Punit Tandel wrote:

Hi Robert

Thanks for the response, So in near future release of the flink version , is this functionality going to be implemented ?

Thanks

On 02/07/2017 04:12 PM, Robert Metzger wrote:
Currently, there is no streaming JDBC connector.
Check out this thread from last year: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Connector-td10508.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>

On Mon, Feb 6, 2017 at 5:00 PM, Ufuk Celebi <uc...@apache.org>> wrote:
I'm not sure how well this works for the streaming API. Looping in
Chesnay, who worked on this.

On Mon, Feb 6, 2017 at 11:09 AM, Punit Tandel <pu...@ericsson.com>> wrote:
> Hi ,
>
> I was looking into flink streaming api and trying to implement the solution
> for reading the data from jdbc database and writing them to jdbc databse
> again.
>
> At the moment i can see the datastream is returning Row from the database.
> dataStream.getType().getGenericParameters() retuning an empty list of
> collection.
>
> I am right now manually creating a database connection and getting the
> schema from ResultMetadata and constructing the schema for the table which
> is a bit heavy operation.
>
> So is there any other way to get the schema for the table in order to create
> a new table and write those records in the database ?
>
> Please let me know
>
> Thanks
> Punit

=============Notice to Recipient: This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain information that is confidential or legally privileged, and intended for the use of the individual or entity named above. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender by telephone or return e-mail and delete the original transmission and its attachments without reading or saving in any manner. Thank you. =============

=============Notice to Recipient: This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain information that is confidential or legally privileged, and intended for the use of the individual or entity named above. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender by telephone or return e-mail and delete the original transmission and its attachments without reading or saving in any manner. Thank you. =============

Re: Data stream to write to multiple rds instances

Posted by Till Rohrmann <tr...@apache.org>.

Hi Sathi,

you can split select or filter your data stream based on the field's value.
Then you are able to obtain multiple data streams which you can output
using a JDBCOutputFormat for each data stream. Be aware, however, that the
JDBCOutputFormat does not give you any processing guarantees since it does
not take part in Flink's checkpointing mechanism. Unfortunately, Flink does
not have a streaming JDBC connector, yet.

Cheers,
Till

On Thu, Mar 2, 2017 at 7:21 AM, Sathi Chowdhury <
Sathi.Chowdhury@elliemae.com> wrote:

> Hi All,
> Is there any preferred way to manage multiple jdbc connections from
> flink..? I am new to flink and looking for some guidance around the right
> pattern and apis to do this. The usecase needs to route a stream to a
> particular jdbc connection depending on a field value.So the records are
> written to multiple destination dbs.
> Thanks
> Sathi
>
> On 02/07/2017 04:12 PM, Robert Metzger wrote:
>
> Currently, there is no streaming JDBC connector.
> Check out this thread from last year: http://apache-flink-
> mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-
> Connector-td10508.html
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>
>
> Sent from my iPhone
>
> On Feb 8, 2017, at 1:49 AM, Punit Tandel <pu...@ericsson.com>
> wrote:
>
> Hi Chesnay
>
> Currently that is what i have done, reading the schema from database in
> order to create a new table in jdbc database and writing the rows coming
> from jdbcinputformat.
>
> Overall i am trying to implement the solution which reads the streaming
> data from one source which either could be coming from kafka, Jdbc, Hive,
> Hdfs and writing those streaming data to output source which is again could
> be any of those.
>
> For a simple use case i have just taken one scenario using jdbc in and
> jdbc out, Since the jdbc input source returns the datastream of Row and to
> write them into jdbc database we have to create a table which requires
> schema.
>
> Thanks
> Punit
>
>
>
> On 02/08/2017 08:22 AM, Chesnay Schepler wrote:
>
> Hello,
>
> I don't understand why you explicitly need the schema since the batch
> JDBCInput-/Outputformats don't require it.
> That's kind of the nice thing about Rows.
>
> Would be cool if you could tell us what you're planning to do with the
> schema :)
>
> In any case, to get the schema within the plan then you will have to query
> the DB and build it yourself. Note that this
> is executed on the client.
>
> Regards,
> Chesnay
>
> On 08.02.2017 00:39, Punit Tandel wrote:
>
> Hi Robert
>
> Thanks for the response, So in near future release of the flink version ,
> is this functionality going to be implemented ?
>
> Thanks
> On 02/07/2017 04:12 PM, Robert Metzger wrote:
>
> Currently, there is no streaming JDBC connector.
> Check out this thread from last year: http://apache-flink-
> mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-
> Connector-td10508.html
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-mailing-list-archive.1008284.n3.nabble.com%2FJDBC-Streaming-Connector-td10508.html&data=01%7C01%7C%7C38def12a718e41d76a0808d45007bf5c%7C0d009d13c2cd47d891dd2ae838b00d4b%7C0&sdata=ncxXmugcAakxfZgRbTqT%2FVU3KqILr1zXB4UCeH%2B9910%3D&reserved=0>
>
>
>
> On Mon, Feb 6, 2017 at 5:00 PM, Ufuk Celebi <uc...@apache.org> wrote:
>
>> I'm not sure how well this works for the streaming API. Looping in
>> Chesnay, who worked on this.
>>
>> On Mon, Feb 6, 2017 at 11:09 AM, Punit Tandel <pu...@ericsson.com>
>> wrote:
>> > Hi ,
>> >
>> > I was looking into flink streaming api and trying to implement the
>> solution
>> > for reading the data from jdbc database and writing them to jdbc databse
>> > again.
>> >
>> > At the moment i can see the datastream is returning Row from the
>> database.
>> > dataStream.getType().getGenericParameters() retuning an empty list of
>> > collection.
>> >
>> > I am right now manually creating a database connection and getting the
>> > schema from ResultMetadata and constructing the schema for the table
>> which
>> > is a bit heavy operation.
>> >
>> > So is there any other way to get the schema for the table in order to
>> create
>> > a new table and write those records in the database ?
>> >
>> > Please let me know
>> >
>> > Thanks
>> > Punit
>>
>
>
>
>
> =============Notice to Recipient: This e-mail transmission, and any
> documents, files or previous e-mail messages attached to it may contain
> information that is confidential or legally privileged, and intended for
> the use of the individual or entity named above. If you are not the
> intended recipient, or a person responsible for delivering it to the
> intended recipient, you are hereby notified that you must not read this
> transmission and that any disclosure, copying, printing, distribution or
> use of any of the information contained in or attached to this transmission
> is STRICTLY PROHIBITED. If you have received this transmission in error,
> please immediately notify the sender by telephone or return e-mail and
> delete the original transmission and its attachments without reading or
> saving in any manner. Thank you. =============
>