You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/01/18 10:45:02 UTC

[GitHub] [incubator-seatunnel] huzk8 opened a new pull request #1088: [refactor] change spark base api to java

huzk8 opened a new pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088


   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[SeaTunnel #XXXX] [component] Title of the pull request", where *SeaTunnel #XXXX* should be replaced by the actual issue number.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   change spark batch base api to java code.
   related to issue :#766 .
   base on pr : #742 
   ## Check list
   
   * [x] Code changed are covered with tests, or it does not need tests for reason:
   * [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] huzk8 commented on a change in pull request #1088: [refactor] change spark base api to java

Posted by GitBox <gi...@apache.org>.
huzk8 commented on a change in pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#discussion_r787529762



##########
File path: seatunnel-apis/seatunnel-api-spark/src/main/scala/org/apache/seatunnel/spark/BaseSparkSink.java
##########
@@ -14,19 +14,28 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.seatunnel.spark
 
-import org.apache.seatunnel.apis.BaseSource
-import org.apache.seatunnel.shade.com.typesafe.config.{Config, ConfigFactory}
+package org.apache.seatunnel.spark;
 
-trait BaseSparkSource[Data] extends BaseSource[SparkEnvironment] {
+import org.apache.seatunnel.apis.BaseSink;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+import org.apache.seatunnel.shade.com.typesafe.config.ConfigFactory;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
 
-  protected var config: Config = ConfigFactory.empty()
+public abstract class BaseSparkSink<OUT>  implements BaseSink<SparkEnvironment> {
+    protected Config config = ConfigFactory.empty();
 
-  override def setConfig(config: Config): Unit = this.config = config
+    @Override
+    public void setConfig(Config config){
+        this.config = config;
+    }
 
-  override def getConfig: Config = config
+    @Override
+    public Config getConfig(){
+        return config;
+    }
 
-  def getData(env: SparkEnvironment): Data;
+    public abstract OUT output(Dataset<Row> data, SparkEnvironment env);

Review comment:
       the origin code is below.
   ```
     def output(data: Dataset[Row], env: SparkEnvironment): OUT;
   
   ```
   Could you describe more detial?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on pull request #1088: [Feature] [api] Change spark base api to java

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#issuecomment-1036931706


   @huzk8  Thank you very much for your contribution, someone has already submitted and merged this part of the code, please check https://github.com/apache/incubator-seatunnel/pull/1188 and look forward to your next contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] huzk8 commented on pull request #1088: [refactor] change spark base api to java

Posted by GitBox <gi...@apache.org>.
huzk8 commented on pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#issuecomment-1019668869


   @CalvinKirs @RickyHuo @garyelephant  review request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on a change in pull request #1088: [refactor] change spark base api to java

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on a change in pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#discussion_r786768654



##########
File path: seatunnel-apis/seatunnel-api-spark/src/main/scala/org/apache/seatunnel/spark/BaseSparkSink.java
##########
@@ -14,19 +14,28 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.seatunnel.spark
 
-import org.apache.seatunnel.apis.BaseSource
-import org.apache.seatunnel.shade.com.typesafe.config.{Config, ConfigFactory}
+package org.apache.seatunnel.spark;
 
-trait BaseSparkSource[Data] extends BaseSource[SparkEnvironment] {
+import org.apache.seatunnel.apis.BaseSink;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+import org.apache.seatunnel.shade.com.typesafe.config.ConfigFactory;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
 
-  protected var config: Config = ConfigFactory.empty()
+public abstract class BaseSparkSink<OUT>  implements BaseSink<SparkEnvironment> {
+    protected Config config = ConfigFactory.empty();
 
-  override def setConfig(config: Config): Unit = this.config = config
+    @Override
+    public void setConfig(Config config){
+        this.config = config;
+    }
 
-  override def getConfig: Config = config
+    @Override
+    public Config getConfig(){
+        return config;
+    }
 
-  def getData(env: SparkEnvironment): Data;
+    public abstract OUT output(Dataset<Row> data, SparkEnvironment env);

Review comment:
       Why modify input parameters and output types?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] huzk8 commented on a change in pull request #1088: [refactor] change spark base api to java

Posted by GitBox <gi...@apache.org>.
huzk8 commented on a change in pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#discussion_r790391025



##########
File path: seatunnel-apis/seatunnel-api-spark/src/main/scala/org/apache/seatunnel/spark/BaseSparkSink.java
##########
@@ -14,19 +14,28 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.seatunnel.spark
 
-import org.apache.seatunnel.apis.BaseSource
-import org.apache.seatunnel.shade.com.typesafe.config.{Config, ConfigFactory}
+package org.apache.seatunnel.spark;
 
-trait BaseSparkSource[Data] extends BaseSource[SparkEnvironment] {
+import org.apache.seatunnel.apis.BaseSink;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+import org.apache.seatunnel.shade.com.typesafe.config.ConfigFactory;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
 
-  protected var config: Config = ConfigFactory.empty()
+public abstract class BaseSparkSink<OUT>  implements BaseSink<SparkEnvironment> {
+    protected Config config = ConfigFactory.empty();
 
-  override def setConfig(config: Config): Unit = this.config = config
+    @Override
+    public void setConfig(Config config){
+        this.config = config;
+    }
 
-  override def getConfig: Config = config
+    @Override
+    public Config getConfig(){
+        return config;
+    }
 
-  def getData(env: SparkEnvironment): Data;
+    public abstract OUT output(Dataset<Row> data, SparkEnvironment env);

Review comment:
       Git Record make confuse.
   The real  code different is below.
   ![image](https://user-images.githubusercontent.com/18548053/150716806-90186fff-c15d-48fc-aa23-54b952e0c9cf.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on a change in pull request #1088: [refactor] change spark base api to java

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on a change in pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#discussion_r786766929



##########
File path: seatunnel-apis/seatunnel-api-spark/src/main/scala/org/apache/seatunnel/spark/BaseSparkTransform.java
##########
@@ -14,20 +14,28 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.seatunnel.spark
 
-import org.apache.spark.sql.{Dataset, Row}
-import org.apache.seatunnel.apis.BaseTransform
-import org.apache.seatunnel.shade.com.typesafe.config.{Config, ConfigFactory}
+package org.apache.seatunnel.spark;
 
-trait BaseSparkTransform extends BaseTransform[SparkEnvironment] {
+import org.apache.seatunnel.apis.BaseTransform;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+import org.apache.seatunnel.shade.com.typesafe.config.ConfigFactory;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
 
-  protected var config: Config = ConfigFactory.empty()
+public abstract class BaseSparkTransform implements BaseTransform<SparkEnvironment> {
+    protected Config config = ConfigFactory.empty();
 
-  override def setConfig(config: Config): Unit = this.config = config
+    @Override
+    public void setConfig(Config config) {
+        this.config = config;
+    }
 
-  override def getConfig: Config = config
+    @Override
+    public Config getConfig() {
+        return config;
+    }
 
-  def process(data: Dataset[Row], env: SparkEnvironment): Dataset[Row];
+    public abstract Dataset<Row> process(Dataset<Row>data, SparkEnvironment env);
 

Review comment:
       Please modify `Dataset<Row>data` to `Dataset<Row> data`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on a change in pull request #1088: [refactor] change spark base api to java

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on a change in pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088#discussion_r788348289



##########
File path: seatunnel-apis/seatunnel-api-spark/src/main/scala/org/apache/seatunnel/spark/BaseSparkSink.java
##########
@@ -14,19 +14,28 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.seatunnel.spark
 
-import org.apache.seatunnel.apis.BaseSource
-import org.apache.seatunnel.shade.com.typesafe.config.{Config, ConfigFactory}
+package org.apache.seatunnel.spark;
 
-trait BaseSparkSource[Data] extends BaseSource[SparkEnvironment] {
+import org.apache.seatunnel.apis.BaseSink;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+import org.apache.seatunnel.shade.com.typesafe.config.ConfigFactory;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
 
-  protected var config: Config = ConfigFactory.empty()
+public abstract class BaseSparkSink<OUT>  implements BaseSink<SparkEnvironment> {
+    protected Config config = ConfigFactory.empty();
 
-  override def setConfig(config: Config): Unit = this.config = config
+    @Override
+    public void setConfig(Config config){
+        this.config = config;
+    }
 
-  override def getConfig: Config = config
+    @Override
+    public Config getConfig(){
+        return config;
+    }
 
-  def getData(env: SparkEnvironment): Data;
+    public abstract OUT output(Dataset<Row> data, SparkEnvironment env);

Review comment:
       > the origin code is below.
   > 
   > ```
   >   def output(data: Dataset[Row], env: SparkEnvironment): OUT;
   > ```
   > 
   > Could you describe more detial?
   
   Am I reading it wrong?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] huzk8 closed pull request #1088: [Feature] [api] Change spark base api to java

Posted by GitBox <gi...@apache.org>.
huzk8 closed pull request #1088:
URL: https://github.com/apache/incubator-seatunnel/pull/1088


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org