You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2018/07/12 22:35:29 UTC

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Hello Adar Dembo, Hao Hao, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10934

to review the following change.


Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................

hms-tool: filter non-Kudu tables in the HMS

Renames HmsCatalog::RetrieveTables to GetKuduTables and changes the
semantics such that only Kudu tables are returned. The only caller was
the HMS tool, and it only needs to inspect Kudu tables.

Special APIs are added to the HMS client and HMS catalog layers that
allow sending a filter to the HMS to strip out non-Kudu tables when
listing tables, as well as a bulk-get API. The combination of these APIs
should be significantly more efficient than issuing a get for every
single table in the HMS and doing Kudu-side filtering.

Also included are some style and formatting fixups in HMS tool.

This patch includes no functional changes to the HMS tool.

Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/hms/hms_client.cc
M src/kudu/hms/hms_client.h
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/mini-cluster/external_mini_cluster-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_hms.cc
10 files changed, 170 insertions(+), 92 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/10934/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG@15
PS2, Line 15: The combination of these APIs
            : should be significantly more efficient than issuing a get for every
            : single table in the HMS and doing Kudu-side filtering.
> Sounds great. Is there anything you can measure to confirm the efficiency g
I don't have numbers yet, because I haven't started stress-testing yet.  There are theoretical changes in the number of RPCs sent to the HMS patch, though:

before: RetrieveTables executes 1 request + 1 request per datatabase in the HMS + 1 request per table in the HMS, including fetching all Hive table objects.  This can be a large amount of data, since parquet tables can have thousands of partitions, each of which has non-negligable data associated.

after: GetKuduTables executes 1 request + 2 requests per databases.  Only Kudu table objects are retrieved, which don't have partitions.


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc
File src/kudu/hms/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc@429
PS2, Line 429: hive::Table
> Nit: auto
Done


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc@219
PS2, Line 219: const auto &
> Nit: const auto&
Done


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h
File src/kudu/hms/hms_client.h:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h@179
PS2, Line 179:   // Retrieves HMS table metadata for many tables.
> Nit: how about "for all tables listed in 'table_names'", to be more precise
Done


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@a378
PS2, Line 378: 
             : 
> Nit: What do you have against passing smart pointers by cref?
I find the double indirection confusing while reading code, and there's arguably a runtime cost.  Mostly that it's just confusing, though.  const T& and T* are unambiguous in that the first means only const methods may be called, while the second typically means non-const methods will be called.  const unique_ptr<T>& completely breaks this pattern.  Additionally, we sometimes use crefs to smart pointers to represent optional values, but that's not the case here.


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@299
PS2, Line 299: isSynced
> Nit: since you're already in the area, this should be IsSynced.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 13 Jul 2018 00:27:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10934/1/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10934/1/src/kudu/hms/hms_catalog.cc@228
PS1, Line 228:               Substitute("$0$1 = \"$2\"",
> does this support OR or IN predicates? If so, maybe we could make this twic
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 12 Jul 2018 23:32:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10934

to look at the new patch set (#2).

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................

hms-tool: filter non-Kudu tables in the HMS

Renames HmsCatalog::RetrieveTables to GetKuduTables and changes the
semantics such that only Kudu tables are returned. The only caller was
the HMS tool, and it only needs to inspect Kudu tables.

Special APIs are added to the HMS client and HMS catalog layers that
allow sending a filter to the HMS to strip out non-Kudu tables when
listing tables, as well as a bulk-get API. The combination of these APIs
should be significantly more efficient than issuing a get for every
single table in the HMS and doing Kudu-side filtering.

Also included are some style and formatting fixups in HMS tool.

This patch includes no functional changes to the HMS tool.

Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/hms/hms_client.cc
M src/kudu/hms/hms_client.h
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/mini-cluster/external_mini_cluster-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_hms.cc
10 files changed, 165 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/10934/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10934/1/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10934/1/src/kudu/hms/hms_catalog.cc@228
PS1, Line 228:               Substitute("$0$1 = \"$2\"",
does this support OR or IN predicates? If so, maybe we could make this twice as fast by avoiding the for loop above?



-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 12 Jul 2018 22:52:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10934/2//COMMIT_MSG@15
PS2, Line 15: The combination of these APIs
            : should be significantly more efficient than issuing a get for every
            : single table in the HMS and doing Kudu-side filtering.
Sounds great. Is there anything you can measure to confirm the efficiency gain?


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc
File src/kudu/hms/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog-test.cc@429
PS2, Line 429: hive::Table
Nit: auto


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_catalog.cc@219
PS2, Line 219: const auto &
Nit: const auto&


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h
File src/kudu/hms/hms_client.h:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/hms/hms_client.h@179
PS2, Line 179:   // Retrieves HMS table metadata for many tables.
Nit: how about "for all tables listed in 'table_names'", to be more precise?


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc
File src/kudu/tools/tool_action_hms.cc:

http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@a378
PS2, Line 378: 
             : 
Nit: What do you have against passing smart pointers by cref?


http://gerrit.cloudera.org:8080/#/c/10934/2/src/kudu/tools/tool_action_hms.cc@299
PS2, Line 299: isSynced
Nit: since you're already in the area, this should be IsSynced.



-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 13 Jul 2018 00:03:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 13 Jul 2018 00:34:34 +0000
Gerrit-HasComments: No

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Hao Hao, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10934

to look at the new patch set (#3).

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................

hms-tool: filter non-Kudu tables in the HMS

Renames HmsCatalog::RetrieveTables to GetKuduTables and changes the
semantics such that only Kudu tables are returned. The only caller was
the HMS tool, and it only needs to inspect Kudu tables.

Special APIs are added to the HMS client and HMS catalog layers that
allow sending a filter to the HMS to strip out non-Kudu tables when
listing tables, as well as a bulk-get API. The combination of these APIs
should be significantly more efficient than issuing a get for every
single table in the HMS and doing Kudu-side filtering.

Also included are some style and formatting fixups in HMS tool.

This patch includes no functional changes to the HMS tool.

Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/hms/hms_client.cc
M src/kudu/hms/hms_client.h
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/mini-cluster/external_mini_cluster-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_hms.cc
10 files changed, 167 insertions(+), 94 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/10934/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] hms-tool: filter non-Kudu tables in the HMS

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10934 )

Change subject: hms-tool: filter non-Kudu tables in the HMS
......................................................................

hms-tool: filter non-Kudu tables in the HMS

Renames HmsCatalog::RetrieveTables to GetKuduTables and changes the
semantics such that only Kudu tables are returned. The only caller was
the HMS tool, and it only needs to inspect Kudu tables.

Special APIs are added to the HMS client and HMS catalog layers that
allow sending a filter to the HMS to strip out non-Kudu tables when
listing tables, as well as a bulk-get API. The combination of these APIs
should be significantly more efficient than issuing a get for every
single table in the HMS and doing Kudu-side filtering.

Also included are some style and formatting fixups in HMS tool.

This patch includes no functional changes to the HMS tool.

Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Reviewed-on: http://gerrit.cloudera.org:8080/10934
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/hms/hms_catalog-test.cc
M src/kudu/hms/hms_catalog.cc
M src/kudu/hms/hms_catalog.h
M src/kudu/hms/hms_client-test.cc
M src/kudu/hms/hms_client.cc
M src/kudu/hms/hms_client.h
M src/kudu/integration-tests/master_hms-itest.cc
M src/kudu/mini-cluster/external_mini_cluster-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_hms.cc
10 files changed, 167 insertions(+), 94 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10934
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5f83d2e705ea6910a9aa0a1eda0d30b5feb2607b
Gerrit-Change-Number: 10934
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>