You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/03/10 07:08:55 UTC

[GitHub] [incubator-seatunnel] Rianico opened a new pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Rianico opened a new pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459


   
   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   closed #1458 
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in you PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/developement/NewLicenseGuide.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] yx91490 commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
yx91490 commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r824306376



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       how about upgrade the kudu-spark version? 1.7.0 seems a little old.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r823440777



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       `format("kudu")` is implemented by `org.apache.kudu:kudu-spark2_2.11`. But it only take affect from `1.9.0`, and earlier version will get an `java.lang.ClassNotFoundException: kudu.DefaultSource` Exception.   
   And the version that I see in pom is `1.7.0` :
   ```xml
   <kudu-spark.version>1.7.0</kudu-spark.version>
   ```
   So it is better to use `kudu` function directly, we don't need to pay attention about `format("kudu")` or `format("org.apache.kudu.spark.kudu")`.
     
   Consider the situation if we have many spark options, the Kudu's official code example may look not so clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r824354989



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       > how about upgrade the kudu-spark version? 1.7.0 seems a little old.
   
   +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r825268606



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       @wuchunfu How about this idea? I'm also willing to contribute the code about the upgrade of Kudu's version and the addition of unit test. :smile: But this is another topic and maybe  I should create another issue and PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] CalvinKirs merged pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
CalvinKirs merged pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r823440777



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       `format("kudu")` is a function in `org.apache.kudu:kudu-spark2_2.11`. But it only take affect from `1.9.0`, and earlier version will get an `java.lang.ClassNotFoundException: kudu.DefaultSource` Exception.   
   And the version that I see in pom is `1.7.0` :
   ```xml
   <kudu-spark.version>1.7.0</kudu-spark.version>
   ```
   So it is better to use `kudu` function directly, we don't need to pay attention about `format("kudu")` or `format("org.apache.kudu.spark.kudu")`.
     
   Consider the situation if we have many spark options, the Kudu's official code example may look not so clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r824486115



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       > how about upgrade the kudu-spark version? 1.7.0 seems a little old.  
   
   +1. How about `1.12.0` ? I have saw that many problems which belong to high priority in jira was solved in this version. 
   There is also two key point about Kudu's version selection:  
   - From `1.9.0`, Kudu offers a way to do junit test, we can refer to: [testing-apache-kudu-applications-on-the-jvm](https://kudu.apache.org/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html#testing-apache-kudu-applications-on-the-jvm).
   - From `1.10.0`, Kudu supports both full and incremental table backups via a job implemented using Apache Spark. We can expand more feature about kudu if we use the version from than `1.10.0`. But maybe we should select `1.10.1` due to [KUDU-2990](https://issues.apache.org/jira/browse/KUDU-2990).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] CalvinKirs commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r825690912



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       It's better to create an email thread to discuss,




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r823440777



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       `format("kudu")` is a function in `org.apache.kudu:kudu-spark2_2.11`. But it only take affect from `1.9.0`, and earlier version will get an `java.lang.ClassNotFoundException: kudu.DefaultSource` Exception.   
   And the version that I see in pom is `1.7.0` :
   ```xml
   <kudu-spark.version>1.7.0</kudu-spark.version>
   ```
   So it is better to use `kudu` function directly, we don't need to care about `kudu` or `org.apache.kudu.spark.kudu`.
     
   Consider the situation if we have many spark options, the Kudu's official code example may look not so clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] wuchunfu commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r823411441



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       @Rianico Seems like this way is better
   ```scala
   // Create a DataFrame that points to the Kudu table we want to query.
   val df = spark.read.options(Map("kudu.master" -> "kudu.master:7051",
                                    "kudu.table" -> "default.my_table")).format("kudu").load
   ```
   
   Please refer to: https://kudu.apache.org/docs/developing.html
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r823440777



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       `format("kudu")` is a function in `org.apache.kudu:kudu-spark2_2.11`. But it only take affect from `1.9.0`, and earlier version will get an `java.lang.ClassNotFoundException: kudu.DefaultSource` Exception.   
   And the version that I see in pom is `1.7.0` :
   ```xml
   <kudu-spark.version>1.7.0</kudu-spark.version>
   ```
   So it is better to use `kudu` function directly, we don't need to care about `format("kudu")` or `format("org.apache.kudu.spark.kudu")`.
     
   Consider the situation if we have many spark options, the Kudu's official code example may look not so clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] Rianico commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
Rianico commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r824486115



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       > how about upgrade the kudu-spark version? 1.7.0 seems a little old.  
   
   +1. How about `1.12.0` ? I have saw that many problems which belong to high priority in jira was solved in this version. 
   There is also two key point about Kudu's version selection:  
   - From `1.9.0`, Kudu offers a way to do junit test, we can refer to: [testing-apache-kudu-applications-on-the-jvm](https://kudu.apache.org/2019/03/19/testing-apache-kudu-applications-on-the-jvm.html#testing-apache-kudu-applications-on-the-jvm).
   - From `1.10.0`, Kudu supports both full and incremental table backups via a job implemented using Apache Spark. We can expand more feature about kudu if we use the version from than `1.10.0`. But maybe we should select the version newer than `1.10.1` due to [KUDU-2990](https://issues.apache.org/jira/browse/KUDU-2990).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] yx91490 commented on a change in pull request #1459: [Improve] [spark-source-kudu] Remove repeated format function call in spark-source-kudu

Posted by GitBox <gi...@apache.org>.
yx91490 commented on a change in pull request #1459:
URL: https://github.com/apache/incubator-seatunnel/pull/1459#discussion_r825239439



##########
File path: seatunnel-connectors/seatunnel-connectors-spark/seatunnel-connector-spark-kudu/src/main/scala/org/apache/seatunnel/spark/source/Kudu.scala
##########
@@ -36,7 +36,6 @@ class Kudu extends SparkBatchSource {
       "kudu.table" -> config.getString("kudu_table"))
 
     val ds = env.getSparkSession.read
-      .format("org.apache.kudu.spark.kudu")
       .options(mapConf)
       .kudu

Review comment:
       vote for 1.12.0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org