You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2019/01/22 02:15:22 UTC

[GitHub] paul-rogers opened a new pull request #1618: DRILL-6950: Row set-based scan framework

paul-rogers opened a new pull request #1618: DRILL-6950: Row set-based scan framework
URL: https://github.com/apache/drill/pull/1618

Adds the "plumbing" that connects the scan operator to the result set loader and the scan projection framework. See the various package-info.java files for the technical details.

The broad idea is that a (file) reader does three things:

* Decides if it can provide a schema up-front (early schema), or if it must discover the schema as the read progresses (late schema).
* If a schema is available up-front, the reader provides that schema.
* The reader then uses a result set loader to read rows into columns, optionally creating new columns (late schema) as the read progresses.

The scan framework handles all the details that were formerly done by the reader:

* Decide how to project the columns found by the reader into the set required by the query.
* Decide when to stop reading a batch (because of a memory limit or a row limit).
* Fill in "implicit" file metadata columns.
* Fill in null columns for missing columns.

Previous PRs provided the underlying mechanisms. This PR provides the "glue" and "plumbing" that connects the reader, the scan operator and the framework mechanisms. A key goal was to minimize "collateral damage" changes to other operators. Although this patch introduces a new structure for the scan operator and readers, the design ensures that this new mechanism can work alongside the "legacy" scanner operator and record readers. A later patch will include the final glue that retrofits the "Easy" scan framework to support the new mechanisms.

This PR does not introduce any actual readers: the work here is plenty large. Readers will come later. One unfortunate side-effect is that the current PR can seem a bit abstract without the ability to connect it to an actual reader. Please refer to my private "RowSetRev4" branch if you want a preview of how the readers work.

Finally, this PR includes a large number of unit tests that validate all of the new mechanisms.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services