You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "MonsterChenzhuo (via GitHub)" <gi...@apache.org> on 2023/03/20 02:06:32 UTC

[GitHub] [incubator-seatunnel] MonsterChenzhuo opened a new pull request, #4372: [WIP][Feature][Connector] http connector support pagable

MonsterChenzhuo opened a new pull request, #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372

   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in your PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   * [ ] If you are contributing the connector code, please check that the following files are updated:
     1. Update change log that in connector document. For more details you can refer to [connector-v2](https://github.com/apache/incubator-seatunnel/tree/dev/docs/en/connector-v2)
     2. Update [plugin-mapping.properties](https://github.com/apache/incubator-seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it
     3. Update the pom file of [seatunnel-dist](https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-dist/pom.xml)
   * [ ] Update the [`release-note`](https://github.com/apache/incubator-seatunnel/blob/dev/release-note.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] mianhuanwu commented on a diff in pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "mianhuanwu (via GitHub)" <gi...@apache.org>.
mianhuanwu commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1143082350


##########
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/config/HttpPage.java:
##########
@@ -0,0 +1,15 @@
+package org.apache.seatunnel.connectors.seatunnel.http.config;
+
+import lombok.Builder;
+import lombok.Data;
+
+import java.io.Serializable;
+
+@Data
+@Builder
+public class HttpPage implements Serializable {
+
+    private String pageNum;
+    private String pageField;
+    private String paheSize;

Review Comment:
   Misspell



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149950479


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   @EricJoy2048 @TyrantLucifer @hailin0 @ic4y @Hisoka-X @liugddx @wuchunfu PTAL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "liugddx (via GitHub)" <gi...@apache.org>.
liugddx commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149958674


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   > The disadvantage of only providing the page range is that it cannot self-increase to traverse all data sets, and users need to clearly know the number of pages they want to synchronize. But the advantage is that there is no requirement for the structure of the user response body.
   
   You can get the specified key through `jsonpath`, such as ``` jsonPath.get("*totalPage")``` . `totalPage` can be used as a parameter
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] ic4y commented on pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "ic4y (via GitHub)" <gi...@apache.org>.
ic4y commented on PR #4372:
URL: https://github.com/apache/seatunnel/pull/4372#issuecomment-1661405040

   @MonsterChenzhuo  Can we continue to advance this PR now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149948596


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, there are two common formats:
   <img width="1144" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106583-f8fbcb64-64b4-412e-9700-a49fb2886240.png">
   <img width="574" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106749-7f4eba17-2e94-4fa4-8431-0d67be29309e.png">
   
   If I want to get totalPage, I need to do it:
   1.First, request an interface `response = httpClient.execute(httpParameter);`
   Get the response body, parse the response body, and get the value of totalPage.
   ObjectMapper mapper = new ObjectMapper();
                       JsonNode root = mapper.readTree(response.getContent());
                       int totalPage =
                               root.get(“data”)
                                       .get("totalPageField")
                                       .asInt();
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149949547


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, the entanglement is that the user's response body needs to be consistent. This json structure, if the user structure does not match, this function will directly fail.
   "data":{
           "total":36,
           "totalPage":4,
           "pageSize":10,
           "currentPage":1,
           "start":0
   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "TyrantLucifer (via GitHub)" <gi...@apache.org>.
TyrantLucifer commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1147673580


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   Connector how to know the total page size of the interface? Different interfaces have different standard method the expose the total pages size, so in connector side we only should offer the config options to tell connector what page number it need read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1145858156


##########
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/config/HttpPage.java:
##########
@@ -0,0 +1,15 @@
+package org.apache.seatunnel.connectors.seatunnel.http.config;
+
+import lombok.Builder;
+import lombok.Data;
+
+import java.io.Serializable;
+
+@Data
+@Builder
+public class HttpPage implements Serializable {
+
+    private String pageNum;
+    private String pageField;
+    private String paheSize;

Review Comment:
   [WIP] is still under development.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149948596


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, there are two common formats:
   <img width="1144" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106583-f8fbcb64-64b4-412e-9700-a49fb2886240.png">
   <img width="574" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106749-7f4eba17-2e94-4fa4-8431-0d67be29309e.png">
   
   If I want to get totalPage, I need to do it:
   1.First, request an interface `response = httpClient.execute(httpParameter);`
   Get the response body, parse the response body, and get the value of totalPage.
   `ObjectMapper mapper = new ObjectMapper();`
   `JsonNode root = mapper.readTree(response.getContent());`
   ` int totalPage =root.get(“data”).get("totalPageField").asInt();`
    `for (int pageNumber = 2; pageNumber <= totalPage; pageNumber++) {`
          // Loop through to get paging content
    `}`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1480799569

   > Hi , How's it going
   It is currently used in this way:
   source {
     Http {
       url = "http://mockserver:1080/paging/mock"
       method = "GET"
       format = "json"
       paging = {
           pageNoField = "pageNo",
           pageNo = "1,2,4,10-50",   // Support single page, range page
           pageSizeField = "pageSize",
           pageSize = 2
       }
       schema = {
         fields {
           name = string
           age = string
         }
       }
     }
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] MonsterChenzhuo closed pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo closed pull request #4372: [WIP][Feature][Connector] http connector support pagable
URL: https://github.com/apache/seatunnel/pull/4372


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] jobmission commented on pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "jobmission (via GitHub)" <gi...@apache.org>.
jobmission commented on PR #4372:
URL: https://github.com/apache/seatunnel/pull/4372#issuecomment-1641359841

   @MonsterChenzhuo  hi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "TyrantLucifer (via GitHub)" <gi...@apache.org>.
TyrantLucifer commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1147086678


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -34,6 +34,10 @@ Used to read data from Http.
 | retry_backoff_multiplier_ms | int    | No       | 100           |
 | retry_backoff_max_ms        | int    | No       | 10000         |
 | common-options              |        | No       | -             |
+| paging.pageNoField          | String | No       | -             |

Review Comment:
   Use underline not camel 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149950398


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   The disadvantage of only providing the page range is that it cannot self-increase to traverse all data sets, and users need to clearly know the number of pages they want to synchronize. But the advantage is that there is no requirement for the structure of the user response body.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on pull request #4372: [Feature][Connector] http connector support pagable

Posted by "EricJoy2048 (via GitHub)" <gi...@apache.org>.
EricJoy2048 commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1492859013

   Please fix ci problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1480810328

   > Hi , How's it going
   
   There is a problem with self-increment, that is, when the data cannot be obtained by paging, the api obtains the response body. Different companies will have different formats, which is difficult to adapt.
   <img width="1168" alt="图片" src="https://user-images.githubusercontent.com/60029759/227152001-dc1fac5b-4e21-444b-bcfe-34494a7c567f.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "EricJoy2048 (via GitHub)" <gi...@apache.org>.
EricJoy2048 commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149944525


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   > > Connector how to know the total page size of the interface? Different interfaces have different standard method the expose the total pages size, so in connector side we only should offer the config options to tell connector what page number it need read.
   > 
   > We add `pageNoField ` to map which field is a page, why not add a parameter to let the user specify which is a totalPage field?
   
   @hailin0  @ic4y  @Hisoka-X  @liugddx  @wuchunfu  PTAL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149948596


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, there are two common formats:
   <img width="1144" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106583-f8fbcb64-64b4-412e-9700-a49fb2886240.png">
   <img width="574" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106749-7f4eba17-2e94-4fa4-8431-0d67be29309e.png">
   
   If I want to get totalPage, I need to do it:
   1.First, request an interface `response = httpClient.execute(httpParameter);`
   Get the response body, parse the response body, and get the value of totalPage.
   `ObjectMapper mapper = new ObjectMapper();`
   `JsonNode root = mapper.readTree(response.getContent());`
   ` int totalPage =root.get(“data”).get("totalPageField").asInt();`
    `for (int pageNumber = 2; pageNumber <= totalPage; pageNumber++) {`
          // Loop through to get paging content
    `}`
    
    
    A sample configuration of seatunnel is as follows:
    
    simple1: Data reading by paging ranges
   
   ```hocon
   Http {
     url = "https://tyrantlucifer.com/api/getDemoData"
     pageing = {
       page_no_field = "pageNo",
       page_no = "1-2,4,8-10",
       page_size_field = "pageSize",
       page_size = 10
     }
     schema {
       fields {
         code = int
         message = string
         data = string
         ok = boolean
       }
     }
   }
   ```
   
   simple2: Self-incrementing paging until no data is fetched
   
   ```hocon
   Http {
     url = "https://tyrantlucifer.com/api/getDemoData"
     pageing = {
       page_no_field = "pageNo",
       page_size_field = "pageSize",
       page_size = 10,
       body_field = "data",
       total_page_field = "totalPage"
     }
     schema {
       fields {
         code = int
         message = string
         data = string
         ok = boolean
       }
     }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] jobmission commented on a diff in pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "jobmission (via GitHub)" <gi...@apache.org>.
jobmission commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1145628447


##########
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/config/HttpPage.java:
##########
@@ -0,0 +1,15 @@
+package org.apache.seatunnel.connectors.seatunnel.http.config;
+
+import lombok.Builder;
+import lombok.Data;
+
+import java.io.Serializable;
+
+@Data
+@Builder
+public class HttpPage implements Serializable {
+
+    private String pageNum;
+    private String pageField;
+    private String paheSize;

Review Comment:
   paheSize -> pageSize ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149948596


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, there are two common formats:
   <img width="1144" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106583-f8fbcb64-64b4-412e-9700-a49fb2886240.png">
   <img width="574" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106749-7f4eba17-2e94-4fa4-8431-0d67be29309e.png">
   
   If I want to get totalPage, I need to do it:
   1.First, request an interface `response = httpClient.execute(httpParameter);`
   Get the response body, parse the response body, and get the value of totalPage.
   `ObjectMapper mapper = new ObjectMapper();
                       JsonNode root = mapper.readTree(response.getContent());
                       int totalPage =
                               root.get(“data”)
                                       .get("totalPageField")
                                       .asInt();
   `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149949547


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, the entanglement is that the user's response body needs to be consistent. This json structure, if the user structure does not match, this function will directly fail.
   "data":{
           "total":36,
           "totalPage":4,
           "pageSize":10,
           "currentPage":1,
           "start":0
   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149950479


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   
   @EricJoy2048 @TyrantLucifer @hailin0 @ic4y @Hisoka-X @liugddx @wuchunfu PTAL
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1495937873

   @EricJoy2048 @TyrantLucifer  Sorry, I've been making changes recently. At present, I'm going to adopt @liugddx  suggestion to use `jsonPath.get("*totalPage")`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "EricJoy2048 (via GitHub)" <gi...@apache.org>.
EricJoy2048 commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149941247


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   > Connector how to know the total page size of the interface? Different interfaces have different standard method the expose the total pages size, so in connector side we only should offer the config options to tell connector what page number it need read.
   
   We add `pageNoField ` to map which field is a page, why not add a parameter to let the user specify which is a totalPage field?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149950479


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   cc @EricJoy2048 @TyrantLucifer  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149949547


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, the entanglement is that the user's response body needs to be consistent. This json structure, if the user structure does not match, this function will directly fail.
   `"data":{
           "total":36,
           "totalPage":4,
           "pageSize":10,
           "currentPage":1,
           "start":0
   }`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1149948596


##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   At present, there are two common formats:
   <img width="1144" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106583-f8fbcb64-64b4-412e-9700-a49fb2886240.png">
   <img width="574" alt="图片" src="https://user-images.githubusercontent.com/60029759/228106749-7f4eba17-2e94-4fa4-8431-0d67be29309e.png">
   
   If I want to get totalPage, I need to do it:
   1.First, request an interface `response = httpClient.execute(httpParameter);`
   Get the response body, parse the response body, and get the value of totalPage.
   `ObjectMapper mapper = new ObjectMapper();`
   `JsonNode root = mapper.readTree(response.getContent());`
   ` int totalPage =root.get(“data”).get("totalPageField").asInt();`
    `for (int pageNumber = 2; pageNumber <= totalPage; pageNumber++) {`
          // Loop through to get paging content
    `}`
    
    
    A sample configuration of seatunnel is as follows:
    
    simple1: Data reading by paging ranges
   
   ```hocon
   Http {
     url = "https://test.com/api/getDemoData"
     method = "GET"
     format = "json"
     pageing = {
       page_no_field = "pageNo",
       page_no = "1-2,4,8-10",
       page_size_field = "pageSize",
       page_size = 10
     }
     schema {
       fields {
         code = int
         message = string
         data = string
         ok = boolean
       }
     }
   }
   ```
   
   simple2: Self-incrementing paging until no data is fetched
   
   ```hocon
   Http {
     url = "https://test.com/api/getDemoData"
     method = "GET"
     format = "json"
     pageing = {
       page_no_field = "pageNo",
       page_size_field = "pageSize",
       page_size = 10,
       body_field = "data",
       total_page_field = "totalPage"
     }
     schema {
       fields {
         code = int
         message = string
         data = string
         ok = boolean
       }
     }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] jobmission commented on pull request #4372: [WIP][Feature][Connector] http connector support pagable

Posted by "jobmission (via GitHub)" <gi...@apache.org>.
jobmission commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1479555663

   Hi , How's it going 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1480812568

   > Hi , How's it going
   
   so, it is only supported: pageNo = "1,2,4,10-50", // Support single page, range page


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] MonsterChenzhuo commented on pull request #4372: [Feature][Connector] http connector support pagable

Posted by "MonsterChenzhuo (via GitHub)" <gi...@apache.org>.
MonsterChenzhuo commented on PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#issuecomment-1482169096

   cc @TyrantLucifer  PTAL, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #4372: [Feature][Connector] http connector support pagable

Posted by "EricJoy2048 (via GitHub)" <gi...@apache.org>.
EricJoy2048 commented on code in PR #4372:
URL: https://github.com/apache/incubator-seatunnel/pull/4372#discussion_r1147246611


##########
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/config/PagingField.java:
##########
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.http.config;
+
+import org.apache.seatunnel.api.configuration.util.OptionMark;
+
+import lombok.Builder;
+import lombok.Data;
+
+import java.io.Serializable;
+import java.util.Map;
+
+@Data

Review Comment:
   Same as above.



##########
docs/en/connector-v2/source/Http.md:
##########
@@ -34,6 +34,10 @@ Used to read data from Http.
 | retry_backoff_multiplier_ms | int    | No       | 100           |
 | retry_backoff_max_ms        | int    | No       | 10000         |
 | common-options              |        | No       | -             |
+| paging.pageNoField          | String | No       | -             |
+| paging.pageNo               | String | No       | -             |
+| paging.pageSizeField        | String | No       | -             |
+| paging.pageSize             | int    | No       | -             |

Review Comment:
   I think you only need to add one option `pageing` in this table and its type if `Object` like `schema` option.



##########
docs/en/connector-v2/source/Http.md:
##########
@@ -289,6 +293,32 @@ Http {
 }
 ```
 
+### page options
+
+## Example
+
+simple:
+
+```hocon
+Http {
+  url = "https://tyrantlucifer.com/api/getDemoData"
+  pageing = {
+    pageNoField = "pageNo",
+    pageNo = "1-2,4,8-10",

Review Comment:
   If I want to get all pageNo datas, how can I config this option?



##########
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/config/HttpParameter.java:
##########
@@ -25,18 +25,25 @@
 import java.util.Map;
 import java.util.stream.Collectors;
 
+import static org.apache.seatunnel.connectors.seatunnel.http.config.HttpConfig.PAGE_NO;
+import static org.apache.seatunnel.connectors.seatunnel.http.config.HttpConfig.PAGE_NO_FIELD;
+import static org.apache.seatunnel.connectors.seatunnel.http.config.HttpConfig.PAGE_SIZE;
+import static org.apache.seatunnel.connectors.seatunnel.http.config.HttpConfig.PAGE_SIZE_FIELD;
+
 @Data

Review Comment:
   Use `@Getter` and `@Setter` is better, because if use this object as a key in Map `@Data` may cause some problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org