You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org> on 2020/02/25 19:41:58 UTC

[Impala-ASF-CR] IMPALA-9363: Add support for skipping given table types

Fang-Yu Rao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15290


Change subject: IMPALA-9363: Add support for skipping given table types
......................................................................

IMPALA-9363: Add support for skipping given table types

This patch allows one to provide Impala with a list of blacklisted table
types when Impala is started so that the tables of the types on the list
would not be loaded when CatalogServiceCatalog is retrieving the
metadata from the Hive metastore. The list of comma-separated
blacklisted table types is passed to Impala via
'--catalogd_args=--blacklisted_table_types=<list_of_blacklisted_types>'.
Five table types are supported, namely, 'hdfs', 'hbase', 'view',
'data_source', and 'kudu'.

Current limitation:
This patch does not deal with the case in which a user would like to
blacklist the views created on top of the specified table types. For
instance, even though a user puts 'kudu' on the list of blacklisted
table types, the metadata of a view created on top of a Kudu table such
as 'functional_kudu.alltypesagg' would still be loaded.

Testing:
- Added an E2E test in test_blacklisted_dbs_and_tables.py.

Change-Id: I49c4062b48f1bb87adfd851ee26cc144fb70b4b7
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/util/CatalogBlacklistUtils.java
M tests/custom_cluster/test_blacklisted_dbs_and_tables.py
7 files changed, 113 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/15290/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15290
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I49c4062b48f1bb87adfd851ee26cc144fb70b4b7
Gerrit-Change-Number: 15290
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9363: Add support for skipping given table types

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15290 )

Change subject: IMPALA-9363: Add support for skipping given table types
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5333/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15290
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49c4062b48f1bb87adfd851ee26cc144fb70b4b7
Gerrit-Change-Number: 15290
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 20:26:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9363: Add support for skipping given table types

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15290 )

Change subject: IMPALA-9363: Add support for skipping given table types
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15290/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15290/1//COMMIT_MSG@14
PS1, Line 14: --catalogd_args=--blacklisted_table_types=<list_of_blacklisted_types>'.
            : Five table types are supported, namely, 'hdfs', 'hbase', 'view',
            : 'data_source', and 'kudu
I think this is a very broad way to ignore the table types. Rather than ignoring the entire type of a table, can we ignore the table based on its Serde? For instance, if the user only wants to ignore only json tables it is not possible to do that with the patch except for ignoring all the HDFSTables.


http://gerrit.cloudera.org:8080/#/c/15290/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/15290/1/be/src/catalog/catalog.cc@54
PS1, Line 54: List of blacklisted table types
Can you add more information here? Specifically, mention that this is a comma separate string where each value represents the table type (or fully qualified SerDe class names). Also, make sure to mention that the tables which are listed here will not be discovered by Catalog server.


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1486
PS1, Line 1486:           continue;
may be add a info log which says that this table will be ignored because it is in the blacklisted table type.


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3085
PS1, Line 3085:   private TTableType getTableType(org.apache.hadoop.hive.metastore.api.Table msTbl) {
change to static method?


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3101
PS1, Line 3101: HBASE_TABLE
Why do we return Hbase table type here? Can you please reconfirm?


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3105
PS1, Line 3105:       return null;
this line with throw a NPE at 1485. May be just ignore the table if you are unable to determine its type here.


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/util/CatalogBlacklistUtils.java
File fe/src/main/java/org/apache/impala/util/CatalogBlacklistUtils.java:

http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/util/CatalogBlacklistUtils.java@121
PS1, Line 121:       else {
nit, move to line 120 after }



-- 
To view, visit http://gerrit.cloudera.org:8080/15290
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49c4062b48f1bb87adfd851ee26cc144fb70b4b7
Gerrit-Change-Number: 15290
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 20:11:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9363: Add support for skipping given table types

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/15290 )

Change subject: IMPALA-9363: Add support for skipping given table types
......................................................................


Patch Set 1:

(3 comments)

Hi Vihang, I have replied you regarding whether or not it is possible to perform a more fine-grained filtering of tables based on the information in the corresponding SerDeInfo. Please let me know how you would like to proceed and let me know if I have missed something important. Thanks!

http://gerrit.cloudera.org:8080/#/c/15290/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15290/1//COMMIT_MSG@14
PS1, Line 14: --catalogd_args=--blacklisted_table_types=<list_of_blacklisted_types>'.
            : Five table types are supported, namely, 'hdfs', 'hbase', 'view',
            : 'data_source', and 'kudu
> I think this is a very broad way to ignore the table types. Rather than ign
Thanks for the suggestion Vihang!

In this regard, I conducted a preliminary investigation and found that except for the HDFS tables, it seems not that obvious to perform a more fine-grained filtering of tables based on the information given in the field 'serdeInfo' under the corresponding instance of StorageDescriptor, which is a field under the class org.apache.hadoop.hive.metastore.api.Table.

Specifically, I tried to collect all the possible mappings from the set of TTableType to the set of serialization libraries. Recall that there is a field of 'serializationLib' under each instance of SerDeInfo. The mappings are given in the following.

The possible mappings for HDFS tables.
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.serde2.avro.AvroSerDe
TTableType.HDFS_TABLE -> org.apache.hadoop.hive.ql.io.orc.OrcSerde

The possible mapping for HBase tables.
TTableType.HBASE_TABLE -> org.apache.hadoop.hive.hbase.HBaseSerDe

The possible mapping for a view.
TTableType.VIEW -> null

The possible mapping for a data source table.
TTableType.DATA_SOURCE_TABLE -> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

The possible mapping for a Kudu table.
TTableType.KUDU_TABLE -> org.apache.hadoop.hive.kudu.KuduSerDe

On the other hand, I also tried to observe the values in other fields of a SerdeInfo, but for now I cannot find a field that could help us perform a more fine-grained filtering of tables.


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3085
PS1, Line 3085:   private TTableType getTableType(org.apache.hadoop.hive.metastore.api.Table msTbl) {
> change to static method?
Done


http://gerrit.cloudera.org:8080/#/c/15290/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3101
PS1, Line 3101: HBASE_TABLE
> Why do we return Hbase table type here? Can you please reconfirm?
Thanks for catching this! It should be TTableType.HDFS_TABLE instead.



-- 
To view, visit http://gerrit.cloudera.org:8080/15290
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49c4062b48f1bb87adfd851ee26cc144fb70b4b7
Gerrit-Change-Number: 15290
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 23:08:22 +0000
Gerrit-HasComments: Yes