You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Volodymyr Vysotskyi (JIRA)" <ji...@apache.org> on 2019/06/07 10:47:00 UTC

[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

     [ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Volodymyr Vysotskyi updated DRILL-7271:
---------------------------------------
    Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: Map<String, StatisticsHolder>.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<String> partitionKeys; -> Map<String, String>
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
partitionValues (List<String>)
location (String) (for directory level metadata) - directory location

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<Path> location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata
13. Introduce segment-level metadata class:
{noformat}
class SegmentMetadata {
  TableInfo tableInfo;
  MetadataInfo metadataInfo;
  SchemaPath column;
  TupleMetadata schema;
  String location;
  Map<SchemaPath, ColumnStatistics> columnsStatistics;
  Map<String, StatisticsHolder> statistics;
  List<String> partitionValues;
  List<String> locations;
  long lastModifiedTime;
}
{noformat}

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: Map<String, StatisticsHolder>.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<String> partitionKeys; -> Map<String, String>
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
partitionValues (List<String>)
location (String) (for directory level metadata) - directory location

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<Path> location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata


> Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7271
>                 URL: https://issues.apache.org/jira/browse/DRILL-7271
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Arina Ielchiieva
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>             Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map<String, StatisticsHolder>.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Set<String> partitionKeys; -> Map<String, String>
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
> partitionValues (List<String>)
> location (String) (for directory level metadata) - directory location
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Set<Path> location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
> path - path to file 
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class
> path - path to file 
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map<SchemaPath, ColumnStatistics> columnsStatistics;
>   Map<String, StatisticsHolder> statistics;
>   List<String> partitionValues;
>   List<String> locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)