You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "vinoyang (JIRA)" <ji...@apache.org> on 2018/09/24 10:55:01 UTC

[jira] [Comment Edited] (FLINK-10315) Let JDBCAppendTableSink be built with java.sql.Connection

    [ https://issues.apache.org/jira/browse/FLINK-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625632#comment-16625632 ] 

vinoyang edited comment on FLINK-10315 at 9/24/18 10:54 AM:
------------------------------------------------------------

Hi [~flacombe],

1) A custom sink essentially provides a UDF. If you don't follow open/close to manage expensive resources, it will cause potential resource leaks (who is responsible for closing it and there is no clear definition). In the streaming sink, many resources are long-lived. If a database connection is externally passed and the outside closes it, the job will fail. For now, constructing a database connection similar code:

+_conn = DriverManager.getConnection(DB_URL, USER, PASS);_+

It is executed on the client side, and this connection needs to be used in the TM.

2) Regarding the second point, if multiple task instances will connect to the database, sharing the connection here is also not a good idea. The reason is also the lifecycle management problem, which leads to an expensive resource and does not have a consensus manager (you need Know that flink does not coordinate communication in multiple task instances in order to manage such resources.

If you really want to use database connections more efficiently, you can consider connection pooling, but this will make the problem complicated, because in what scope do you share the same connection pool? Jobs currently written using the Flink API cannot achieve this.

The most critical issue, Flink requires that all fields in the UDF need to be serializable, and database connections can be difficult to achieve.

In the final analysis, I would like to say that a resource that is initialized and opened by a client (whether or not it can be serialized) is not suitable for sharing by multiple instances in the TM. Otherwise, according to your thinking, other connectors can choose to do so.


was (Author: yanghua):
Hi [~flacombe],

A custom sink essentially provides a UDF. If you don't follow open/close to manage expensive resources, it will cause potential resource leaks (who is responsible for closing it and there is no clear definition). In the streaming sink, many resources are long-lived. If a database connection is externally passed and the outside closes it, the job will fail. For now, constructing a database connection similar code:

+_conn = DriverManager.getConnection(DB_URL, USER, PASS);_+

It is executed on the client side, and this connection needs to be used in the TM.

2) Regarding the second point, if multiple task instances will connect to the database, sharing the connection here is also not a good idea. The reason is also the lifecycle management problem, which leads to an expensive resource and does not have a consensus manager (you need Know that flink does not coordinate communication in multiple task instances in order to manage such resources.

If you really want to use database connections more efficiently, you can consider connection pooling, but this will make the problem complicated, because in what scope do you share the same connection pool? Jobs currently written using the Flink API cannot achieve this.

The most critical issue, Flink requires that all fields in the UDF need to be serializable, and database connections can be difficult to achieve.

In the final analysis, I would like to say that a resource that is initialized and opened by a client (whether or not it can be serialized) is not suitable for sharing by multiple instances in the TM. Otherwise, according to your thinking, other connectors can choose to do so.

> Let JDBCAppendTableSink be built with java.sql.Connection
> ---------------------------------------------------------
>
>                 Key: FLINK-10315
>                 URL: https://issues.apache.org/jira/browse/FLINK-10315
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API
>         Environment: I'm currently using Flink 1.6.0 Java.
>            Reporter: François Lacombe
>            Assignee: vinoyang
>            Priority: Major
>
> Currently, JDBCAppendTableSink is built with methods like setDBUrl, setUsername, setPassword... and so on.
> We can't use an existing Java SQL connection to build it.
> It may be great to add a setConnection() method to the builder class as to prevent sensitive data like username or password to transit through large stacks from config connectors (often in main()) to JDBC sinks.
> To be able to provide only one object is far lighter than 4 or 5 strings
>  
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)