You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/05/13 09:59:44 UTC

[GitHub] [incubator-seatunnel] tmljob opened a new pull request, #1872: [Feature][Connector]Spark connector for excel Source

tmljob opened a new pull request, #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872

   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   Spark connector for excel Source
   
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in your PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1132434608

   > @BenJFan @wuchunfu @CalvinKirs Is there anything else that needs to be modified in my PR? Why can't it be merged into dev?
   
   did you resolve CI fail first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
BenJFan commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r880075762


##########
tools/dependencies/known-dependencies.txt:
##########
@@ -707,4 +707,17 @@ zookeeper-3.4.6.jar
 zookeeper-3.5.9.jar
 zookeeper-jute-3.5.9.jar
 zstd-jni-1.3.3-1.jar
-zstd-jni-1.4.3-1.jar
\ No newline at end of file
+zstd-jni-1.4.3-1.jar

Review Comment:
   The corresponding dependencies should be put into the corresponding lines, according to lexicographical order



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
BenJFan commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r881521564


##########
seatunnel-dist/release-docs/LICENSE:
##########
@@ -896,6 +896,7 @@ The text of each license is the standard Apache 2.0 license.
      (The Apache Software License, Version 2.0) transport-netty4 (org.elasticsearch.plugin:transport-netty4-client:7.5.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) x-content (org.elasticsearch:elasticsearch-x-content:6.3.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) zookeeper (org.apache.zookeeper:zookeeper:3.4.10 - no url defined)
+     (The Apache Software License, Version 2.0) spark-excel (com.crealytics:spark-excel_2.11:0.11.1 - https://github.com/crealytics/spark-excel)

Review Comment:
   Seem like you should add other license in this file. like 
   ```
   poi-4.0.0.jar
   poi-ooxml-4.0.0.jar
   poi-ooxml-schemas-4.0.0.jar
   ```
   They are new jar you add on known-dependcies.txt, so you should add license link on this file, you can copy example from already exist license.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] tmljob commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
tmljob commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1132390237

   @BenJFan @wuchunfu @CalvinKirs 
   Is there anything else that needs to be modified in my PR? Why can't it be merged into dev?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
BenJFan commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r880143979


##########
seatunnel-dist/release-docs/LICENSE:
##########
@@ -896,6 +896,7 @@ The text of each license is the standard Apache 2.0 license.
      (The Apache Software License, Version 2.0) transport-netty4 (org.elasticsearch.plugin:transport-netty4-client:7.5.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) x-content (org.elasticsearch:elasticsearch-x-content:6.3.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) zookeeper (org.apache.zookeeper:zookeeper:3.4.10 - no url defined)
+     (The Apache Software License, Version 2.0) spark-excel (com.crealytics:spark-excel_2.11:0.11.1 - https://github.com/crealytics/spark-excel)

Review Comment:
   > I introduced the spark-excel tripartite package. In addition to listing the license of spark-excel, I also need to list the licesne of the library it depends on, right?
   
   Right, just add license which that you add jar name in know-dependencies.txt



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] tmljob commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
tmljob commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1132458900

   > > @BenJFan @wuchunfu @CalvinKirs Is there anything else that needs to be modified in my PR? Why can't it be merged into dev?
   > 
   > can you resolve CI fail first?
   
   I don't understand why the build fails under ubuntu, and no clear error message is provided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1135410865

   > > > > @BenJFan @wuchunfu @CalvinKirs Is there anything else that needs to be modified in my PR? Why can't it be merged into dev?
   > > > 
   > > > 
   > > > can you resolve CI fail first?
   > > 
   > > 
   > > I don't understand why the build fails under ubuntu, and no clear error message is provided.
   > 
   > @CalvinKirs Is CI OK now, only one task is canceled by timeout.
   
   don't worry~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
BenJFan commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1127138876

   Can this plugin support custom header( or custom fields) in config now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r880060333


##########
seatunnel-dist/release-docs/LICENSE:
##########
@@ -896,6 +896,7 @@ The text of each license is the standard Apache 2.0 license.
      (The Apache Software License, Version 2.0) transport-netty4 (org.elasticsearch.plugin:transport-netty4-client:7.5.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) x-content (org.elasticsearch:elasticsearch-x-content:6.3.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) zookeeper (org.apache.zookeeper:zookeeper:3.4.10 - no url defined)
+     (The Apache Software License, Version 2.0) spark-excel (com.crealytics:spark-excel_2.11:0.11.1 - https://github.com/crealytics/spark-excel)

Review Comment:
   Other licenses also need to be added here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
BenJFan commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1138060978

   @CalvinKirs Hi, please check if the license is legal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] tmljob commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
tmljob commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r873310657


##########
docs/en/connector/source/Excel.md:
##########
@@ -0,0 +1,49 @@
+# Excel
+
+## Description
+
+Read data from excel file
+
+:::tip
+
+Engine Supported and plugin name
+
+* [x] Spark: Excel
+* [ ] Flink:
+
+:::
+
+## Options
+
+| name              | type    | required | default value |
+| ----------------- | ------- | -------- | ------------- |
+| path              | string  | yes      |               |
+| options.useHeader | boolean | yes      |               |
+| options.*         |         |          |               |
+
+### path[string]
+
+file path, starting with file://
+
+### options.useHeader[boolean]
+
+Whether to use the first row of Excel as the header
+
+### options.*
+
+For details, refer to the parameters of the third-party library [spark-excel](https://github.com/crealytics/spark-excel).

Review Comment:
   
   
   
   
   > Can this plugin support custom header( or custom fields) in config now?
   
   1、You can set whether to use the first line as the header, but it cannot be customized
   2、You can set the area range for reading data through "DataAddress"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wuchunfu commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
wuchunfu commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1126121455

   @tmljob Please resolve the conflicting files, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] BenJFan commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
BenJFan commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r873275387


##########
docs/en/connector/source/Excel.md:
##########
@@ -0,0 +1,49 @@
+# Excel
+
+## Description
+
+Read data from excel file
+
+:::tip
+
+Engine Supported and plugin name
+
+* [x] Spark: Excel
+* [ ] Flink:
+
+:::
+
+## Options
+
+| name              | type    | required | default value |
+| ----------------- | ------- | -------- | ------------- |
+| path              | string  | yes      |               |
+| options.useHeader | boolean | yes      |               |
+| options.*         |         |          |               |
+
+### path[string]
+
+file path, starting with file://
+
+### options.useHeader[boolean]
+
+Whether to use the first row of Excel as the header
+
+### options.*
+
+For details, refer to the parameters of the third-party library [spark-excel](https://github.com/crealytics/spark-excel).

Review Comment:
   After read this document, I can't know how to set options.*, I think move there option to here is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] tmljob commented on a diff in pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
tmljob commented on code in PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#discussion_r880136852


##########
seatunnel-dist/release-docs/LICENSE:
##########
@@ -896,6 +896,7 @@ The text of each license is the standard Apache 2.0 license.
      (The Apache Software License, Version 2.0) transport-netty4 (org.elasticsearch.plugin:transport-netty4-client:7.5.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) x-content (org.elasticsearch:elasticsearch-x-content:6.3.1 - https://github.com/elastic/elasticsearch)
      (The Apache Software License, Version 2.0) zookeeper (org.apache.zookeeper:zookeeper:3.4.10 - no url defined)
+     (The Apache Software License, Version 2.0) spark-excel (com.crealytics:spark-excel_2.11:0.11.1 - https://github.com/crealytics/spark-excel)

Review Comment:
   I introduced the spark-excel tripartite package. In addition to listing the license of spark-excel, I also need to list the licesne of the library it depends on, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] tmljob commented on pull request #1872: [Feature][Connector]Spark connector for excel Source

Posted by GitBox <gi...@apache.org>.
tmljob commented on PR #1872:
URL: https://github.com/apache/incubator-seatunnel/pull/1872#issuecomment-1132852425

   > > > @BenJFan @wuchunfu @CalvinKirs Is there anything else that needs to be modified in my PR? Why can't it be merged into dev?
   > > 
   > > 
   > > can you resolve CI fail first?
   > 
   > I don't understand why the build fails under ubuntu, and no clear error message is provided.
   
   @CalvinKirs 
   Is CI OK now, only one task is canceled by timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org