You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/08 09:51:38 UTC

[GitHub] [drill] jnturton opened a new pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

jnturton opened a new pull request #2424:
URL: https://github.com/apache/drill/pull/2424


   # [DRILL-8092](https://issues.apache.org/jira/browse/DRILL-8092): Add Auto Pagination to HTTP Storage Plugin
   
   ## Description
   
   This PR is a follow-up to #2414 and refactors the Paginator classes in that PR.  No functionality is changed.
    
   ## Documentation
   See #2414.
   
   ## Testing
   See #2414.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2424:
URL: https://github.com/apache/drill/pull/2424#issuecomment-1008238421


   @paul-rogers yes, of course. I'll see if I can expand the scope of this refactoring PR to add that in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2424:
URL: https://github.com/apache/drill/pull/2424#issuecomment-1016222748


   > @paul-rogers yes, of course. I'll see if I can expand the scope of this refactoring PR to add that in.
   
   In the paginated case, a separate batch reader is created for each page of data returned by the HTTP API.
   
   - This plugin's JSON batch reader uses JsonLoader from EVF which will try to fill up a Drill batch for each page of API data.
   - Its CSV batch reader makes direct use of the Univocity CSV parse rather than going via CompliantTextBatchReader (why?) but includes its own batch-filling loop.
   - Its XML batch reader uses XmlReader from the format-xml which is also EVF-based and tries to fill up a Drill batch.
   
   So I think the "A natural structure is to create one Drill batch per HTTP page" comment is already addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2424:
URL: https://github.com/apache/drill/pull/2424#discussion_r786652254



##########
File path: contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/paginator/PagePaginator.java
##########
@@ -64,16 +60,16 @@ public PagePaginator(Builder builder, int limit, int pageSize, String pageParam,
   }
 
   @Override
-  public String next() {
-    if (hasLimit) {
-      return super.next();
-    } else {
-      return generateNextUrl();
-    }
+  public boolean hasNext() {
+    return !partialPageReceived && (limit < 0 || (currentPage-1) * pageSize < limit);

Review comment:
       @paul-rogers I can't see that LIMIT 0 is disallowed.  This `hasNext()` method above returns false immediately in this case and 0 records are returned for the query, is that not correct?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton merged pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
jnturton merged pull request #2424:
URL: https://github.com/apache/drill/pull/2424


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2424:
URL: https://github.com/apache/drill/pull/2424#discussion_r786651480



##########
File path: contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/paginator/PagePaginator.java
##########
@@ -47,11 +44,10 @@
    * @param pageSizeParam The API Query parameter which specifies how many results per page
    */
   public PagePaginator(Builder builder, int limit, int pageSize, String pageParam, String pageSizeParam) {
-    super(builder, paginationMode.PAGE, pageSize, limit > 0);
+    super(builder, paginationMode.PAGE, pageSize, limit);
     this.limit = limit;
     this.pageParam = pageParam;
     this.pageSizeParam = pageSizeParam;
-    this.paginatedUrls = buildPaginatedURLs();
     currentPage = 1;

Review comment:
       @cgivre While testing this PR today I encountered an API that starts counting pages at 0.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on a change in pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #2424:
URL: https://github.com/apache/drill/pull/2424#discussion_r780724393



##########
File path: contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/paginator/PagePaginator.java
##########
@@ -64,16 +60,16 @@ public PagePaginator(Builder builder, int limit, int pageSize, String pageParam,
   }
 
   @Override
-  public String next() {
-    if (hasLimit) {
-      return super.next();
-    } else {
-      return generateNextUrl();
-    }
+  public boolean hasNext() {
+    return !partialPageReceived && (limit < 0 || (currentPage-1) * pageSize < limit);

Review comment:
       Limit code is off. A limit of 0 should be allowed: it means schema only. Fixed in the EVF V2/Easy Format Plugin PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2424: DRILL-8092: Add Auto Pagination to HTTP Storage Plugin

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2424:
URL: https://github.com/apache/drill/pull/2424#issuecomment-1016228933


   Since I'm not aware of any outstanding concerns I'll merge this once the CI test run passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org