You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/12/24 04:34:00 UTC

[jira] [Commented] (DRILL-8092) Add Auto Pagination to HTTP Storage Plugin

    [ https://issues.apache.org/jira/browse/DRILL-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464891#comment-17464891 ] 

ASF GitHub Bot commented on DRILL-8092:
---------------------------------------

cgivre opened a new pull request #2414:
URL: https://github.com/apache/drill/pull/2414


   # [DRILL-8092](https://issues.apache.org/jira/browse/DRILL-8092): Add Auto Pagination to HTTP Storage Plugin
   
   ## Description
   This PR adds the ability for Drill to access APIs that have some sort of pagination.  In a nutshell, let's say an API limits you to 100 records per page.  This improvement allows Drill to execute multiple HTTP requests to retrieve the complete dataset.
   
   This update works in two ways: with a limit and without.   In the event a limit is pushed down, the new paginator object will generate the correct number of URLs and BatchReaders, execute the queries and return the results.  Currently, this is executed in series, but in future work this could be parallelized.
   
   In the event a limit is not pushed down, the reader will keep generating URLs and retrieving data until the row count of data returned is less than the page size.
   
   ## Documentation
   (From README)
   Remote APIs frequently implement some sort of pagination as a way of limiting results.  However, if you are performing bulk data analysis, it is necessary to reassemble the 
   data into one larger dataset.  Drill's auto-pagination features allow this to happen in the background, so that the user will get clean data back.
   
   To use a paginator, you simply have to configure the paginator in the connection for the particular API.  
   
   ## Offset Pagination
   Offset Pagination uses commands similar to SQL which has a `LIMIT` and an `OFFSET`.  With an offset paginator, let's say you want 200 records and the maximum page size is 50 
   records, the offset paginator will break up your query into 4 requests as shown below:
   
   * myapi.com?limit=50&offset=0
   * myapi.com?limit=50?offset=50
   * myapi.com?limit=50&offset=100
   * myapi.com?limit=50&offset=150
   
   ### Configuring Offset Pagination
   To configure an offset paginator, simply add the following to the configuration for your connection. 
   
   ```json
   "paginator": {
      "limitField": "<limit>",
      "offsetField": "<offset>",
      "maxPageSize": 100,
      "method": "OFFSET"
   }
   ```
   
   ## Page Pagination
   Page pagination is very similar to offset pagination except instead of using an `OFFSET` it uses a page number. 
   
   ```json
    "paginator": {
           "pageField": "page",
           "pageSizeField": "per_page",
           "maxPageSize": 100,
           "method": "PAGE"
         }
   ```
   
   ## Testing
   Added unit tests and tested manually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add Auto Pagination to HTTP Storage Plugin
> ------------------------------------------
>
>                 Key: DRILL-8092
>                 URL: https://issues.apache.org/jira/browse/DRILL-8092
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Other
>    Affects Versions: 1.19.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.0
>
>
> See github



--
This message was sent by Atlassian Jira
(v8.20.1#820001)