You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2019/10/29 15:14:43 UTC

[GitHub] [drill] vvysotskyi opened a new pull request #1886: DRILL-7273: Introduce operators for handling metadata

vvysotskyi opened a new pull request #1886: DRILL-7273: Introduce operators for handling metadata
URL: https://github.com/apache/drill/pull/1886
 
 
   Jira: [DRILL-7273](https://issues.apache.org/jira/browse/DRILL-7273)
   
   This pull request introduces commands and operators for collecting table metadata and storing it to the metastore.
   
   Entry point for ANALYZE command is `MetastoreAnalyzeTableHandler` class. It creates plan which includes some metastore-specific operators for collecting metadata.
   
   New operators are the following:
   `MetadataAggBatch` - operator which adds aggregate calls for all incoming table columns to calculate required metadata and produces aggregations. If aggregation is performed on top of another aggregation, required aggregate calls for merging metadata will be added.
   
   `MetadataHandlerBatch` - operator responsible for handling metadata returned by incoming aggregate operators and fetching required metadata form the metastore to produce further aggregations.
   
   `MetadataControllerBatch` - responsible for converting obtained metadata, fetching absent metadata from the metastore and storing resulting metadata into the metastore.
   
   `MetastoreAnalyzeTableHandler` has 2 classes which depending on the table type, provides the information required for building a suitable plan for collecting metadata: `AnalyzeInfoProvider` and `MetadataInfoCollector`.
   
   `MetastoreAnalyzeTableHandler` based on segments count, forms plan in the following form:
   
   ```
   MetadataControllerBatch
   	...
   		MetadataHandlerBatch
   			MetadataAggBatch
   				MetadataHandlerBatch
   					MetadataAggBatch
   						Scan
   ```
   The lowest `MetadataAggBatch` creates required aggregate calls for every (or interesting only) table columns and produces aggregations with grouping by segment columns that correspond to specific table level.
   `MetadataHandlerBatch` above it populates batch with additional information about metadata type and other info.
   `MetadataAggBatch` above merges metadata calculated before to obtain metadata for parent metadata levels and also stores incoming data to populate it to the metastore later.
   
   `MetadataControllerBatch` obtains all calculated metadata, converts it to the suitable form and sends it to the metastore.
   
   For the case of incremental analyze, `MetastoreAnalyzeTableHandler` creates `Scan` with updated files only and provides `MetadataHandlerBatch` with information about metadata which should be fetched from the metastore, so existing actual metadata wouldn't be recalculated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services