You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2018/07/02 23:04:21 UTC

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#6).

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 410 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 6
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>