You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2018/06/26 00:04:18 UTC

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Hello Adar Dembo, Hao Hao, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10817

to review the following change.


Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

WIP: KUDU-2191: support table-name identifiers with upper case chars

The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters previously
could not be altered or deleted. This commit introduces changes to how
the Catalog Manager handles identifiers when the HMS integration is
enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
metadata entries in the sys-catalog, and storing a 'normalized'
(down-cased) identifier in the by-name table map. The various parts of
the catalog manager which deal with the by-name map are converted to use
the normalized version of the name.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '(\w_/)+\.(\w_/)+').

WIP: I'm unsure about these details of the patch:

* Database names and table names are treated the same, including case
  preservation. This may lead to confusion since Kudu tables 'foo.bar'
  and 'FOO.baz' will be in the same Hive database.

* When a client opens a table the name stored in the table object is the
  same as the client specifies.  So if table 'default.FOO' is opened with name
  'Default.Foo', the table object will internally hold the name
  'Default.Foo'.

Both of these seem like the right thing to do to me, but may be a source
of confusion.

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
9 files changed, 356 insertions(+), 95 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h
File src/kudu/hms/hms_catalog.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@162
PS3, Line 162:   Status Reconnect();
> But why is it important for NormalizeTableName to mutate in place?
It's not inherently important, but I don't think it's more complicated to do it in-place than the alternative.


http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/master/catalog_manager.cc@4796
PS5, Line 4796: In this case the table is guaranteed to be a legacy table which
              :     // has survived since before the cluster was configured to integrate with
              :     // the HMS.
> That's not true across the board though; NormalizeTableName is used to norm
Well in CreateTable there shouldn't be an existing table at all, so this comment doesn't really apply.

If NormalizeTableName returned an error then CreateTable would have to conditionally call it based on whether the hms integration is enabled or not.  I think it's simpler the way it is.  If the integration is enabled and the table name isn't valid, then CreateTable will still fail because the table name gets validated again in HmsCatalog::CreateTable, and then again by the HMS.


http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/mini-cluster/external_mini_cluster.cc
File src/kudu/mini-cluster/external_mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/mini-cluster/external_mini_cluster.cc@264
PS5, Line 264: void ExternalMiniCluster::DisableMetastoreIntegration() {
> Shouldn't this enforce that the cluster is shut down? The enforcement in Se
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 7
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Jul 2018 19:52:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10817/1/src/kudu/hms/hms_catalog-test.cc
File src/kudu/hms/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/10817/1/src/kudu/hms/hms_catalog-test.cc@108
PS1, Line 108: UMPED
jumps



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Jun 2018 00:22:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#7).

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 411 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 7
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 10
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Jul 2018 22:55:52 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#8).

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 422 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 8
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h
File src/kudu/hms/hms_catalog.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@131
PS3, Line 131: only the
             :   // characters a-z, A-Z, 0-9, _, and /.
Nit: I think you can get by with "only ascii alphanumerics, _, and /"


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@134
PS3, Line 134: Normalizeed
Normalized


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@140
PS3, Line 140: table
Nit: "name" maybe? You've used 'table' for a hive::Table* in PopulateTable so I don't think it should be used for a std::string here.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@162
PS3, Line 162:   static Status ParseTableName(const std::string& table,
I think StringPiece is a more idiomatic choice for "pointer to part of a string". It doesn't have a mutable_data() member (which you'll need for ToLowerCase) but you can add that.

Also, please update the function doc to explain what the lifecycles of hms_database and hms_table are now.

On second thought, why bother with this optimization? Why not just continue to return std::strings? This doesn't seem performance critical, so the extra lifecycle considerations don't seem worth it.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc@105
PS3, Line 105: '_', and '/'
> The HMS appears to always allow underscores, and allows forward slashes by 
Should we at least warn users about how '/' is configurable in the HMS? Or is that too much detail?


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@477
PS3, Line 477: vector<string>(
Surprised you needed this. Below too.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.h@766
PS3, Line 766: 'normalized_table_names_map_', 'table_ids_map_',
             :   // and 'tablet_map_'
Nit: feel free to make this vaguer so it needn't be updated whenever the maps change names.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@318
PS3, Line 318:       auto* existing = InsertOrReturnExisting(&catalog_manager_->normalized_table_names_map_,
I'm a little nervous about this; I think there may have been a good reason for why the code overwrites entries in table_names_map_, beyond consistency with the insertion into table_ids_map_ and tablet_map_.

I can't think of any now, though. Anything interesting in the git history?


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@328
PS3, Line 328:             Substitute("$0 or $1 [id=$2]", (*existing)->ToString(), l.data().name(), table_id));
Wouldn't it be helpful to show the IDs of both tables?


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1513
PS3, Line 1513: );
> When HMS integration is enabled, maybe we should log the normalized  table 
If you do that, update the log statement on L1525 and possibly elsewhere here too.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@2304
PS3, Line 2304:     // reserve the new table name, since it's already reserved by way of
Nit: normally I wouldn't advocate for a double negative, but I wonder if the condition would be more readable if it matched the comment:

  if (!(table_name != req.new_table_name() && normalized_table_name == normalized_new_table_name))


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@4794
PS3, Line 4794:   string normalized_table_name = table_name;
Why ignore_result()? AFAICT every place calling CatalogManager::NormalizeTableName could return a bad Status, so why not propagate this?


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/mini-cluster/external_mini_cluster.h
File src/kudu/mini-cluster/external_mini_cluster.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/mini-cluster/external_mini_cluster.h@322
PS3, Line 322:   // Enable Hive Metastore integration.
             :   // Overrides HMS integration options set by ExternalMiniClusterOptions.
             :   void EnableMetastoreIntegration();
             : 
             :   // Disable Hive Metastore integration.
             :   // Overrides HMS integration options set by ExternalMiniClusterOptions.
             :   void DisableMetastoreIntegration();
These should probably doc that the cluster must be restarted for the change to take effect.

Alternatively, perhaps it should be like SetDaemonBinPath and the cluster has to be stopped to even call these? Seems simpler for clients.



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Jun 2018 17:43:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#10).

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 421 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/10
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 10
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#5).

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

WIP: KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

WIP: I'm unsure about these details of the patch:

* The current approach relies on the HMS preserving table identifier
  case in notification log events. I want to consult with Hive experts
  about whether this is a guarentee of the API or just happens to be the
  case in the current implementation.

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 404 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 5
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 9: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10817/9/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/9/src/kudu/integration-tests/master_hms-itest.cc@560
PS9, Line 560: seing
seeing



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 9
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Jul 2018 22:40:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#3).

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

WIP: KUDU-2191: support table-name identifiers with upper case chars

The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

WIP: I'm unsure about these details of the patch:

* Database names and table names are treated the same, including case
  preservation. This may lead to confusion since Kudu tables 'foo.bar'
  and 'FOO.baz' will be in the same Hive database.

* When a client opens a table the name stored in the table object is the
  name the client specifies.  So if table 'default.FOO' is opened with name
  'Default.Foo', the table object will internally hold the name
  'Default.Foo'.

Both of these seem like the right thing to do to me, but may be a source
of confusion.

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
9 files changed, 367 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h
File src/kudu/hms/hms_catalog.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@131
PS3, Line 131: 
             :   // Valid Kudu/HMS table names consist 
> Nit: I think you can get by with "only ascii alphanumerics, _, and /"
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@134
PS3, Line 134: alphanumeri
> Normalized
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@140
PS3, Line 140: tils.
> Nit: "name" maybe? You've used 'table' for a hive::Table* in PopulateTable 
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@162
PS3, Line 162:   // Parses a Kudu table name into a Hive database and table name.
>  I think StringPiece is a more idiomatic choice for "pointer to part of a string". It doesn't have a mutable_data() member (which you'll need for ToLowerCase) but you can add that.

It's not trivial to add mutable_data() since StringPiece's data field is a const pointer. It appears that StringPiece is generally used as a const view into a string.

> Also, please update the function doc to explain what the lifecycles of hms_database and hms_table are now. 

Done

> On second thought, why bother with this optimization? Why not just continue to return std::strings? This doesn't seem performance critical, so the extra lifecycle considerations don't seem worth it.

It's necessary in order to implement NormalizeTableName as mutating the table name in place.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc@105
PS3, Line 105: '_', and '/'
> Should we at least warn users about how '/' is configurable in the HMS? Or 
Too much detail IMO.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@477
PS3, Line 477:  Kudu first.
> Surprised you needed this. Below too.
{ "..", ".." } compiles here, but not in the ASSERT calls below.  My experience with relying on the compiler to infer these things has been platform specific and generally poor, so I reflexively reach for the explicit form now.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.h@766
PS3, Line 766: by-name map, table-id map, and tablet
             :   // map), and loads t
> Nit: feel free to make this vaguer so it needn't be updated whenever the ma
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@318
PS3, Line 318:       auto* existing = InsertOrReturnExisting(&catalog_manager_->normalized_table_names_map_,
> I'm a little nervous about this; I think there may have been a good reason 
This line exists nearly unaltered since early 2014, so I don't think there's anything special about the overwriting behavior.  I think it's just always been an invariant that you can't have two tables with the same name.

https://github.com/apache/kudu/commit/b0af34c92a0ea0b9c9764798d514b6c5aadda7d4


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@2304
PS3, Line 2304:           normalized_table_name == normalized_new_table_name)) {
> Nit: normally I wouldn't advocate for a double negative, but I wonder if th
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@4794
PS3, Line 4794:   if (hms::HmsCatalog::IsEnabled()) {
> I'm going to add a comment with more color in a comment, but ignoring this 
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/mini-cluster/external_mini_cluster.h
File src/kudu/mini-cluster/external_mini_cluster.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/mini-cluster/external_mini_cluster.h@322
PS3, Line 322:   // Enable Hive Metastore integration.
             :   // Overrides HMS integration options set by ExternalMiniClusterOptions.
             :   // The cluster must be shut down before calling this method.
             :   void EnableMetastoreIntegration();
             : 
             :   // Disable Hive Metastore integration.
             :   // Overrides HMS integration option
> These should probably doc that the cluster must be restarted for the change
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 5
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Jun 2018 23:57:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#9).

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 421 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 9
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10817/9/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/9/src/kudu/integration-tests/master_hms-itest.cc@560
PS9, Line 560: seein
> seeing
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 10
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Jul 2018 22:55:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#6).

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 410 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 6
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@18
PS1, Line 18: deleted.
^ when the HMS integration is enabled.


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@17
PS1, Line 17: previously
            : could
can


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@28
PS1, Line 28: entries
entry


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@29
PS1, Line 29: he by
^ ephemeral


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@31
PS1, Line 31: the name
^ When the HMS integration is not configured, normalized table names are simply the original table name, so the behavior changes that this patch introduces are entirely opt-in.


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@48
PS1, Line 48: same as
name



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Jun 2018 17:23:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@7
PS1, Line 7: with upper case chars
> I feel it might be a bit confusing to call it support upper case table name
I've added a 'summary' sentence which hopefully clarifies.


http://gerrit.cloudera.org:8080/#/c/10817/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/2//COMMIT_MSG@47
PS2, Line 47: FOO.baz
> nit: FOO.bar
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@65
PS3, Line 65: MasterHmsTest : public ExternalMiniClusterITestBase {
> Should we add some more test cases with upper cases table name in other int
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@529
PS3, Line 529: cluster_->ShutdownNodes(cluster::ClusterNodes::MASTERS_ONLY);
             :   cluster_->EnableMetastoreIntegration();
             :   ASSERT_TRUE(cluster_->Restart().IsNetworkError());
> nit: add a quick comment for what those lines are doing.
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1488
PS3, Line 1488:   if (FLAGS_catalog_manager_check_ts_count_for_create_table &&
> remove this, it was just to reduce log spam in tests.
Done


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1513
PS3, Line 1513: table_name
> When HMS integration is enabled, maybe we should log the normalized  table 
It ends up being difficult to determine when the duplicate table only differs based on case, because the table object we lookup doesn't include the name.  To get the name you have to take a CoW lock.  Mind if we punt on this and revisit if experience shows it's a point of confusion?



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Jun 2018 17:41:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Reviewed-on: http://gerrit.cloudera.org:8080/10817
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 421 insertions(+), 116 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 11
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@17
PS1, Line 17: can not be
            : alter
> can
Done


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@18
PS1, Line 18: the HMS 
> ^ when the HMS integration is enabled.
Done


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@28
PS1, Line 28: og meta
> entry
Done


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@29
PS1, Line 29: by-na
> ^ ephemeral
Done


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@31
PS1, Line 31: name. Wh
> ^ When the HMS integration is not configured, normalized table names are si
Done


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@39
PS1, Line 39: les be
> should be braces
Done


http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@48
PS1, Line 48: 
> name
Done


http://gerrit.cloudera.org:8080/#/c/10817/1/src/kudu/hms/hms_catalog-test.cc
File src/kudu/hms/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/10817/1/src/kudu/hms/hms_catalog-test.cc@108
PS1, Line 108: UMPS/
> jumps
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Jun 2018 21:11:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1513
PS3, Line 1513: );
> Should I just use the normalized name in this error message / elsewhere?  I
I don't really care either way, and I buy your argument that because normalization is an implementation detail, it shouldn't be surfaced in logs.



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Jun 2018 22:50:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@328
PS3, Line 328:             Substitute("$0 or $1 [id=$2]", (*existing)->ToString(), l.data().name(), table_id));
> Wouldn't it be helpful to show the IDs of both tables?
It's a little misleading, but this does in fact show both IDs.  TableInfo::ToString does the equivalent of Substitute("$0 [$1]", table_name, table_id).  I'd have used that for the 'table' local variable as well, however it doesn't get its table name field set until the Commit call a few lines down, so the table name shows up as empty.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1513
PS3, Line 1513: );
> If you do that, update the log statement on L1525 and possibly elsewhere he
Should I just use the normalized name in this error message / elsewhere?  I can kind of see it both ways.  I want to avoid overly verbose error messages attempting to explain what normalization is, though.  I think that will cause more confusion than just picking unormalized or normalized.  Normalization is an implementation detail, the public facing explanation should be case preserving, but insensitive on lookup.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@4794
PS3, Line 4794:   string normalized_table_name = table_name;
> Why ignore_result()? AFAICT every place calling CatalogManager::NormalizeTa
I'm going to add a comment with more color in a comment, but ignoring this status is critical to supporting legacy tables after enabling the HMS integration in an existing cluster.



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Jun 2018 19:34:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#4).

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

WIP: KUDU-2191: support table-name identifiers with upper case chars

Summary: When the HMS integration is enabled, Kudu now preserves table
name casing, but uses case-insensitive lookups to retrieve tables.

Background: The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

WIP: I'm unsure about these details of the patch:

* Database names and table names are treated the same, including case
  preservation. This may lead to confusion since Kudu tables 'foo.bar'
  and 'FOO.bar' will be in the same Hive database.

* When a client opens a table the name stored in the table object is the
  name the client specifies.  So if table 'default.FOO' is opened with name
  'Default.Foo', the table object will internally hold the name
  'Default.Foo'.

Both of these seem like the right thing to do to me, but may be a source
of confusion.

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
10 files changed, 368 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10817/8/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/8/src/kudu/integration-tests/master_hms-itest.cc@563
PS8, Line 563:   // leader, so the Restart() call may suceed. In these situations we
succeed



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 8
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Jul 2018 20:57:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 9:

> Patch Set 9: -Verified
> 
> Build Started http://jenkins.kudu.apache.org/job/kudu-gerrit/14041/

master_hms-itest now appears to be flake free on the latest revisions: http://dist-test.cloudera.org/job?job_id=dan.1530650594.134508


-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 9
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Jul 2018 21:53:03 +0000
Gerrit-HasComments: No

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h
File src/kudu/hms/hms_catalog.h:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.h@162
PS3, Line 162:   // Parses a Kudu table name into a Hive database and table name.
> >  I think StringPiece is a more idiomatic choice for "pointer to part of a
But why is it important for NormalizeTableName to mutate in place?


http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/master/catalog_manager.cc@4796
PS5, Line 4796: In this case the table is guaranteed to be a legacy table which
              :     // has survived since before the cluster was configured to integrate with
              :     // the HMS.
That's not true across the board though; NormalizeTableName is used to normalize the name of a new table in CreateTable. Shouldn't we preserve the error in that case?


http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/mini-cluster/external_mini_cluster.cc
File src/kudu/mini-cluster/external_mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10817/5/src/kudu/mini-cluster/external_mini_cluster.cc@264
PS5, Line 264: void ExternalMiniCluster::DisableMetastoreIntegration() {
Shouldn't this enforce that the cluster is shut down? The enforcement in SetDaemonBinPath is in the SetExePath calls.



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 5
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 28 Jun 2018 00:28:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@7
PS1, Line 7: with upper case chars
I feel it might be a bit confusing to call it support upper case table name, since the patch is actually making table naming case insensitive as what I understand. It may be more clear to the users if we explicitly call out that table naming is now case insensitive with HMS integration enabled.


http://gerrit.cloudera.org:8080/#/c/10817/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/2//COMMIT_MSG@47
PS2, Line 47: FOO.baz
nit: FOO.bar


http://gerrit.cloudera.org:8080/#/c/10817/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/3//COMMIT_MSG@54
PS3, Line 54: source
            : of confusion
I feel confusion might be mitigated if we always store the normalized form, and it will be more aligned with the HMS behavior. But it looks like it is hard be do so for legacy tables.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc@105
PS3, Line 105: '_', and '/'
Is this naming restriction configurable in HMS?


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@65
PS3, Line 65: MasterHmsTest : public ExternalMiniClusterITestBase {
Should we add some more test cases with upper cases table name in other integration tests, e.g. master-stress-test.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/integration-tests/master_hms-itest.cc@529
PS3, Line 529: cluster_->ShutdownNodes(cluster::ClusterNodes::MASTERS_ONLY);
             :   cluster_->EnableMetastoreIntegration();
             :   ASSERT_TRUE(cluster_->Restart().IsNetworkError());
nit: add a quick comment for what those lines are doing.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1513
PS3, Line 1513: table_name
When HMS integration is enabled, maybe we should log the normalized  table name to avoid confusion?


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@2301
PS3, Line 2301: In this case
Since without HMS integration, we do not allow rename to same table name. It feels a bit wired that we consider rename to same normalized name legal while create another table with the same normalized name illegal with HMS integration.



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Jun 2018 23:33:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10817

to look at the new patch set (#2).

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................

WIP: KUDU-2191: support table-name identifiers with upper case chars

The HMS lowercases all database (table) identifiers during database
(table) creation, only storing the lowercased version. On database and
table lookup the HMS automatically does a case-insensitive compare.
During table creation Kudu checks that table names are valid UTF-8, and
does no transformations on identiers. During table lookups Kudu requires
that the table name match exactly, including case.

As a result of these behavior differences and the design of the
notification log listener, tables with upper-case characters can not be
altered or deleted when the HMS integration is enabled. This commit
fixes this by changing how the Catalog Manager handles identifiers when
the HMS integration is enabled:

* During table creation, the Catalog Manager preserves the case of table
  names.
* On table lookup, the Catalog Manager does a case-insensitive
  comparison to find the table.

This is implemented by storing the preserved case in the table's
sys-catalog metadata entry, and storing a 'normalized' (down-cased)
identifier in the ephemeral by-name table map. The various parts of the
catalog manager which deal with the by-name map are converted to use the
normalized version of the name. When the HMS integration is not
configured, normalized table names are equal to the original table name,
so the behavior changes that this patch introduces are entirely opt-in.

There is one edge case that complicates turning on the HMS integration
in rare circumstances: if there are existing (legacy) tables with names
which map to the same normalized form (e.g. differ only in case), the
catalog manager will fail to startup and instruct the operator to rename
the offending tables before trying again. Additionally, this check only
applies to tables that otherwise follow the Hive table naming rules
(matching regex '[\w_/]+\.[\w_/]+').

WIP: I'm unsure about these details of the patch:

* Database names and table names are treated the same, including case
  preservation. This may lead to confusion since Kudu tables 'foo.bar'
  and 'FOO.baz' will be in the same Hive database.

* When a client opens a table the name stored in the table object is the
  name the client specifies.  So if table 'default.FOO' is opened with name
  'Default.Foo', the table object will internally hold the name
  'Default.Foo'.

Both of these seem like the right thing to do to me, but may be a source
of confusion.

Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
9 files changed, 361 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 3:

(4 comments)

Will fixup remaining comments with code changes.

http://gerrit.cloudera.org:8080/#/c/10817/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/3//COMMIT_MSG@54
PS3, Line 54: source
            : of confusion
> I feel confusion might be mitigated if we always store the normalized form,
The main concern with storing normalized table names is that a client that issues CREATE TABLE FOO, then a ListTables RPC will receive [ "foo" ] and not ["FOO"].  I think the patch as implemented is overall less surprising given the existing Kudu semantics.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/hms/hms_catalog.cc@105
PS3, Line 105: '_', and '/'
> Is this naming restriction configurable in HMS?
The HMS appears to always allow underscores, and allows forward slashes by default, but can be configured not to allow forward slashes.  When running against an HMS which is configured to disallow '/' the HMS should maintain the invariant so we don't need to do anything special.  We could change this error message to be more generic like '... identifier pair, each which must be Hive compatible', but then our users will have to go and look up Hive identifier rules, and their docs are _awful_.

HMS checks identifiers here: https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L486


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@1488
PS3, Line 1488:   if (FLAGS_catalog_manager_check_ts_count_for_create_table &&
remove this, it was just to reduce log spam in tests.


http://gerrit.cloudera.org:8080/#/c/10817/3/src/kudu/master/catalog_manager.cc@2301
PS3, Line 2301: In this case
> Since without HMS integration, we do not allow rename to same table name. I
If we didn't allow it there would be no way to change the non-normalized name of a table, and the non-normalized name is user visible via the ListTables RPC.



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Jun 2018 23:52:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 )

Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10817/1//COMMIT_MSG@39
PS1, Line 39: (\w_/)
should be braces



-- 
To view, visit http://gerrit.cloudera.org:8080/10817
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd
Gerrit-Change-Number: 10817
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Jun 2018 20:40:30 +0000
Gerrit-HasComments: Yes