You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Hao Hao (Code Review)" <ge...@cloudera.org> on 2019/04/22 05:40:19 UTC

[kudu-CR] hms: skip drop table notification log for external table

Hao Hao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13073


Change subject: hms: skip drop table notification log for external table
......................................................................

hms: skip drop table notification log for external table

Currently, we enforce that Kudu's and the HMS's tables were always synced.
However for backwards compatibility, we need to support external tables
(i.e. tables that are not dropped from Kudu when dropped in Impala).

This patch introduces two changes to support the above:
1) distinguish the tables by type (managed or external) during metadata
synchronization between Kudu and the HMS. When dropping an external table
in the HMS, tables in Kudu will not be dropped.
2) mark tables created through Kudu as managed tables.

Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
---
M src/kudu/hms/hms_catalog.cc
M src/kudu/integration-tests/hms_itest-base.cc
M src/kudu/integration-tests/hms_itest-base.h
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/master/hms_notification_log_listener.cc
M src/kudu/tools/tool_action_hms.cc
6 files changed, 54 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/13073/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>

[kudu-CR] hms: skip drop table notification log for external table

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13073 )

Change subject: hms: skip drop table notification log for external table
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@335
PS1, Line 335: Status HmsCatalog::PopulateTable(const string& id,
Might want to rename this to PopulateManagedTable to emphasize that, for use cases like IsSynced() in tool_action_hms.cc, there needs to be some explicit overriding in order to ensure that the right table type is used.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@357
PS1, Line 357: will _be_
The emphasis is probably better suited for _will_ rather than _be_. But I'm reading the paragraph a few times and it doesn't make sense to me: yes, we'll delete the HDFS directory, but why should we do that, given the race described in the second half of the paragraph?

I think you're trying to explain that the client to this function asked for a Kudu managed table so we need to provide that, but then to mention that, unfortunately, this introduces a particular race. Speaking of, is that a race that now needs to be handled?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.h
File src/kudu/integration-tests/hms_itest-base.h:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.h@55
PS1, Line 55:                            const std::string& table_type);
This can only be managed or external, right? If so, use an enum instead?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc
File src/kudu/integration-tests/hms_itest-base.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@140
PS1, Line 140:   if (table_type == HmsClient::kExternalTable) {
             :     table.parameters[HmsClient::kExternalTableKey] = "TRUE";
             :   }
Could you doc why we need to set both table.tableType as well as the table parameter? Is it a workaround for HIVE-19253?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@171
PS1, Line 171:   hms_table_copy.tableType = hms_table.tableType;
Should doc here why we need to override the tableType and parameter values set in PopulateTable().


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@173
PS1, Line 173:     hms_table_copy.parameters[HmsClient::kExternalTableKey] = "TRUE";
Doc the workaround here too so we can grep for it and remove it later.



-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 22 Apr 2019 21:24:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms: skip drop table notification log for external table

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13073 )

Change subject: hms: skip drop table notification log for external table
......................................................................


Patch Set 1:

(15 comments)

Sorry to do this post-review, but I broke this into two smaller patches:

https://gerrit.cloudera.org/c/13314/ hms: don't overwrite table type when updating metadata
https://gerrit.cloudera.org/c/13315/ hms: ignore DROP TABLE events from external tables

http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG@9
PS1, Line 9: synced.
> nit: could you keep the lines in this messages under 72 symbols wide?
Done


http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG@10
PS1, Line 10: external tables
> Do we have other cases besides Kudu+Impala integration where we have such '
I think managed/external tables are a concept in Hive as well, but for all intents and purposes, only Impala-Kudu users will use these tables (for now).


http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG@16
PS1, Line 16: tables
> nit: the table ?
Done


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@335
PS1, Line 335: Status HmsCatalog::PopulateTable(const string& id,
> Alternatively, pass in an enum that will populate the externalness in this 
I left it up to the caller with an enum.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@355
PS1, Line 355: 
             :   // TODO(hao): 
> What's the TODO in regards to this code block? That we should remove this c
Done


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@357
PS1, Line 357: will _be_
> The emphasis is probably better suited for _will_ rather than _be_. But I'm
Yeah, I think she was just leaving open that such races will now exist.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.h
File src/kudu/integration-tests/hms_itest-base.h:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.h@55
PS1, Line 55:                            const std::string& table_type);
> This can only be managed or external, right? If so, use an enum instead?
Done


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc
File src/kudu/integration-tests/hms_itest-base.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@140
PS1, Line 140:   if (table_type == HmsClient::kExternalTable) {
             :     table.parameters[HmsClient::kExternalTableKey] = "TRUE";
             :   }
> I'm not sure about I have the full context of kExternalTableKey=TRUE, but w
Done

I opted to keep it as is -- may be surprising to change the HMS table silently as I proposed.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@144
PS1, Line 144:   // The KuduMetastorePlugin only allows the master to alter a Kudu table,
             :   // so we pretend to be the master.
> Just curious, what would be the effect of not having this? Would the reques
Done


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@148
PS1, Line 148: AlterTable
> I'm curious whether:
>it's necessary to set kKuduTableIdKey property as well?
This gets filled in PopulateTable().

>this method can be successfully called multiple times producing the same result (i.e. whether it's idempotent)?
I assume so. Why do you ask?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/master_hms-itest.cc@324
PS1, Line 324:   ASSERT_OK(client_->OpenTable("default.e", &table));
> Might it be any race here due to the nature of synchronization between Kudu
Done


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@169
PS1, Line 169: boost::none,
> nit: can you annotate this variable with a comment?
Done


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@171
PS1, Line 171:   hms_table_copy.tableType = hms_table.tableType;
> Should doc here why we need to override the tableType and parameter values 
Pushed this elsewhere (PopulateTable()) so we don't need to overwrite anything.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@173
PS1, Line 173:     hms_table_copy.parameters[HmsClient::kExternalTableKey] = "TRUE";
> Doc the workaround here too so we can grep for it and remove it later.
Pushed this elsewhere (PopulateTable())


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@175
PS1, Line 175: hms_table_copy == hms_table
> I'm curious whether this comparison will work after overriding the HmsClien
If the kExternalTableKey isn't set in the HMS, I think it may be reasonable to consider this unsynchronized (ie the table _must_ have the `kExternalTableKey = TRUE` to be considered a valid external table). For now I've kept it as is.



-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 11 May 2019 02:32:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms: skip drop table notification log for external table

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13073 )

Change subject: hms: skip drop table notification log for external table
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG@9
PS1, Line 9: synced.
nit: could you keep the lines in this messages under 72 symbols wide?


http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG@10
PS1, Line 10: external tables
Do we have other cases besides Kudu+Impala integration where we have such 'external' semantic?  If not, maybe change into 'external Kudu tables in Impala' ?  Or the whole idea is that with this change HMS can support external Kudu tables as well?


http://gerrit.cloudera.org:8080/#/c/13073/1//COMMIT_MSG@16
PS1, Line 16: tables
nit: the table ?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/master_hms-itest.cc@324
PS1, Line 324:   ASSERT_OK(client_->OpenTable("default.e", &table));
Might it be any race here due to the nature of synchronization between Kudu and HMS metadata?  Imagine HMS get a bug and sends table removal event to Kudu 5 seconds later.  How do we know that's not the case here?



-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 24 Apr 2019 06:50:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms: skip drop table notification log for external table

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has abandoned this change. ( http://gerrit.cloudera.org:8080/13073 )

Change subject: hms: skip drop table notification log for external table
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] hms: skip drop table notification log for external table

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13073 )

Change subject: hms: skip drop table notification log for external table
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc
File src/kudu/integration-tests/hms_itest-base.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@148
PS1, Line 148: AlterTable
I'm curious whether:
* it's necessary to set kKuduTableIdKey property as well?
* this method can be successfully called multiple times producing the same result (i.e. whether it's idempotent)?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@175
PS1, Line 175: hms_table_copy == hms_table
I'm curious whether this comparison will work after overriding the HmsClient::kExternalTableKey parameter in the 'hms_table_copy'?  Shouldn't we also somehow do the same for 'hms_table'?



-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 24 Apr 2019 07:41:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms: skip drop table notification log for external table

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13073 )

Change subject: hms: skip drop table notification log for external table
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@335
PS1, Line 335: Status HmsCatalog::PopulateTable(const string& id,
> Might want to rename this to PopulateManagedTable to emphasize that, for us
Alternatively, pass in an enum that will populate the externalness in this function. That way, the caller doesn't need to do explicit overwriting, and we just get the right hive table for the type of table we care about.

That might make the IsSynced function less confusing too IMO, avoiding overwriting the external field, and whatnot.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/hms/hms_catalog.cc@355
PS1, Line 355: 
             :   // TODO(hao): 
What's the TODO in regards to this code block? That we should remove this comment once HIVE-21640 is fixed?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc
File src/kudu/integration-tests/hms_itest-base.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@140
PS1, Line 140:   if (table_type == HmsClient::kExternalTable) {
             :     table.parameters[HmsClient::kExternalTableKey] = "TRUE";
             :   }
I'm not sure about I have the full context of kExternalTableKey=TRUE, but what do you think of plumbing this into HmsClient::GetTable? If this is important anywhere else, it seems really easy to overlook.


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/integration-tests/hms_itest-base.cc@144
PS1, Line 144:   // The KuduMetastorePlugin only allows the master to alter a Kudu table,
             :   // so we pretend to be the master.
Just curious, what would be the effect of not having this? Would the request be refused?


http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/13073/1/src/kudu/tools/tool_action_hms.cc@169
PS1, Line 169: boost::none,
nit: can you annotate this variable with a comment?



-- 
To view, visit http://gerrit.cloudera.org:8080/13073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ffb23d507cc0d9ba9e46983e5bbf6b7116f7515
Gerrit-Change-Number: 13073
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 23 Apr 2019 18:52:44 +0000
Gerrit-HasComments: Yes