You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/02/18 12:23:59 UTC

[GitHub] [incubator-seatunnel] xsaffable opened a new pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

xsaffable opened a new pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287


   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in you PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/developement/NewLicenseGuide.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] asdf2014 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r813005355



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       @xsaffable Thanks for your contribution, let's wait the CI pass. Sorry, I don't have permission to help you to start the pending workflows. Maybe @leo65535 can help you out I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] asdf2014 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r813481241



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       @leo65535 You are welcome




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] asdf2014 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r812850642



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       May I ask why we don't return the sink instance here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] xsaffable commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
xsaffable commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r812910071



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       I can change this function to void and then submit this pr again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#issuecomment-1068753337


   @xsaffable Please resolve the conflicting files, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] xsaffable commented on pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
xsaffable commented on pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#issuecomment-1056506000


   @leo65535 Hi PTAL.Thx.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] leo65535 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
leo65535 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r813462583



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       Thanks for your review @asdf2014 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] xsaffable commented on pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
xsaffable commented on pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#issuecomment-1069013451


   > @xsaffable Please resolve the conflicting files, thanks
   
   Hello, i have resolved the conflict file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] asdf2014 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r812862706



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       Indeed, it would be better we improve the interface directly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] xsaffable commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
xsaffable commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r812996206



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       I'm done. @leo65535 @asdf2014 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] leo65535 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
leo65535 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r812886779



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       > 
   
   You are right.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] leo65535 commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
leo65535 commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r812860504



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       In the main logic, we don't use the returned sink instance.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] leo65535 commented on pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
leo65535 commented on pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#issuecomment-1048462559


   Overall LGTM, cc @asdf2014 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] xsaffable commented on a change in pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
xsaffable commented on a change in pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#discussion_r813854607



##########
File path: seatunnel-connectors/seatunnel-connector-flink-druid/src/main/java/org/apache/seatunnel/flink/sink/DruidSink.java
##########
@@ -45,14 +47,16 @@
     private String timestampFormat;
     private String timestampMissingValue;
 
+    @Nullable
     @Override
-    public DataSink<Row> outputBatch(FlinkEnvironment env, DataSet<Row> dataSet) {
-        DataSink<Row> dataSink = dataSet.output(new DruidOutputFormat(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue));
+    public DataStreamSink<Row> outputStream(FlinkEnvironment env, DataStream<Row> dataStream) {
+        DataStreamSink<Row> rowDataStreamSink = dataStream.addSink(new DruidSinkFunction<>(
+                new DruidOutputFormat<>(coordinatorURL, datasource, timestampColumn, timestampFormat, timestampMissingValue)));
         if (config.hasPath(PARALLELISM)) {
             int parallelism = config.getInt(PARALLELISM);
-            return dataSink.setParallelism(parallelism);
+            rowDataStreamSink.setParallelism(parallelism);
         }
-        return dataSink;
+        return null;

Review comment:
       Thank you @leo65535 @asdf2014 . But I see it seems to time out, can you help me




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] xsaffable edited a comment on pull request #1287: [Feature] [flink] Remove flink DataSet api (#1246)

Posted by GitBox <gi...@apache.org>.
xsaffable edited a comment on pull request #1287:
URL: https://github.com/apache/incubator-seatunnel/pull/1287#issuecomment-1069013451


   > @xsaffable Please resolve the conflicting files, thanks
   
   Hello, i have resolved the conflict file. @wuchunfu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org