You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Attila Jeges (Code Review)" <ge...@cloudera.org> on 2019/08/07 17:01:15 UTC

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Attila Jeges has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14032


Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................

WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

HIVE-22046 added 'engine' column to TAB_COL_STATS and PART_COL_STATS
HMS tables. The new column is used to differentiate among column stats
computed by different engines. The related HMS API calls were changed
accordingly.

This change sets 'engine' parameter to 'impala' in all the affected
HMS API calls.

For now this feature works with external tables only as COMPUTE STATS
is not yet supported for managed tables (see IMPALA-8836).

Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
---
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_hms_integration.py
7 files changed, 108 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14032/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................

WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

This patch-set is in WIP state since HIVE-22046 and the HMS API
changes introduced by it are not yet available for Impala via
native-toolchain.

HIVE-22046 added 'engine' column to TAB_COL_STATS and PART_COL_STATS
HMS tables. The new column is used to differentiate among column stats
computed by different engines. The related HMS API calls were changed
accordingly.

This change sets 'engine' parameter to 'impala' in all the affected
HMS API calls.

For now this feature works with external tables only as COMPUTE STATS
is not yet supported for managed tables (see IMPALA-8836).

Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
---
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_hms_integration.py
7 files changed, 108 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14032/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14032/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14032/1//COMMIT_MSG@7
PS1, Line 7: WIP:
Can you add information about why is this in WIP state? The change seems complete to me with tests and without TODOs.


http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@893
PS1, Line 893: '(x int)'
Can you also test what happens with partitioned tables?


http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@902
PS1, Line 902:         # Impala doesn't read column stats written by Hive.
             :         self.client.execute('invalidate metadata %s' % table_name)
             :         impala_stats = self.impala_all_column_stats(table_name)
             :         assert '-1' == impala_stats['x']['#nulls']
             :         assert '-1' == impala_stats['x']['ndv']
It would be a change in design, but wouldn't it make sense to use stats written by Hive if there are no stats written by Impala? This could avoid breaking some existing workflows non partitioned tables.



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 12:53:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@893
PS1, Line 893: '(x int)'
> Hive stores column stats for partitioned tables in PART_COL_STATS while Imp
I would prefer to have that test here (I guess that it would't be a big change, only CREATE TABLE and INSERT would need to change. testing numRows would be also nice, even if it is not changed in this patch, as the test would become a good summary of Impala-Hive stat interop.

I am ok with just mentioning these in a comment and probably doing it later.



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 18:44:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14032/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/14032/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1366
PS2, Line 1366:               table.getDb().getName(), table.getName(), col.getName(), "Impala");
> Probably best to make this a constant - HMS_STATS_ENGINE_NAME or whatever
Done.

Moved deleteTableColumnStatistics() wrapper to MetastoreShim.


http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@893
PS1, Line 893: '(x int)'
> I would prefer to have that test here (I guess that it would't be a big cha
Done


http://gerrit.cloudera.org:8080/#/c/14032/2/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/2/tests/metadata/test_hms_integration.py@876
PS2, Line 876:   @pytest.mark.execute_serially
> Does this actually need to be serial? It looks like it is only operating on
True, done.



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 20:58:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................

WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

This patch-set is in WIP state since HIVE-22046 and the HMS API
changes introduced by it are not yet available for Impala via
native-toolchain.

HIVE-22046 added 'engine' column to TAB_COL_STATS and PART_COL_STATS
HMS tables. The new column is used to differentiate among column stats
computed by different engines. The related HMS API calls were changed
accordingly.

This change sets 'engine' parameter to 'impala' in all the affected
HMS API calls.

For now this feature works with external tables only as COMPUTE STATS
is not yet supported for managed tables (see IMPALA-8836).

Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
---
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_hms_integration.py
7 files changed, 185 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14032/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has abandoned this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Abandoned

Accommodating HIVE-22046 in Impala will be done in multiple steps in a staged manner.
-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 2:

(2 comments)

Couple of minor comments but the change looks good once the hive version is available

http://gerrit.cloudera.org:8080/#/c/14032/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/14032/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1366
PS2, Line 1366:               table.getDb().getName(), table.getName(), col.getName(), "Impala");
Probably best to make this a constant - HMS_STATS_ENGINE_NAME or whatever


http://gerrit.cloudera.org:8080/#/c/14032/2/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/2/tests/metadata/test_hms_integration.py@876
PS2, Line 876:   @pytest.mark.execute_serially
Does this actually need to be serial? It looks like it is only operating on its own tables independently.



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 18:51:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14032/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14032/1//COMMIT_MSG@7
PS1, Line 7: WIP:
> Can you add information about why is this in WIP state? The change seems co
Done


http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@893
PS1, Line 893: '(x int)'
> Can you also test what happens with partitioned tables?
Hive stores column stats for partitioned tables in PART_COL_STATS while Impala uses TAB_COL_STATS (as described in L884-886), so in case of partition tables the Impala/Hive column statistics were properly separated even before this patch.

I can still add a partition-table test if you think it is necessary.


http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@902
PS1, Line 902:         # Impala doesn't read column stats written by Hive.
             :         self.client.execute('invalidate metadata %s' % table_name)
             :         impala_stats = self.impala_all_column_stats(table_name)
             :         assert '-1' == impala_stats['x']['#nulls']
             :         assert '-1' == impala_stats['x']['ndv']
> Agreed, I think this is important to think about, either to provide documen
As discussed with the other stakeholders for now we will keep the stats separately for simplicity/transparency reasons. We can revisit this behavior later.



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 15:53:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4182/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 16:33:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14032/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/14032/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@142
PS3, Line 142:     return client.getTableColumnStatistics(dbName, tableName, colNames, /*engine*/IMPALA_ENGINE);
line too long (97 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 21:00:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4177/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Aug 2019 17:42:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/14032/1/tests/metadata/test_hms_integration.py@902
PS1, Line 902:         # Impala doesn't read column stats written by Hive.
             :         self.client.execute('invalidate metadata %s' % table_name)
             :         impala_stats = self.impala_all_column_stats(table_name)
             :         assert '-1' == impala_stats['x']['#nulls']
             :         assert '-1' == impala_stats['x']['ndv']
> It would be a change in design, but wouldn't it make sense to use stats wri
Agreed, I think this is important to think about, either to provide documentation or a solution in the code. We can't avoid breaking some workflows, e.g. if a user depends on one engine's refresh/analyze to keep stats fresh for all engines, but there are also stale stats for the other engine.

I think mostly we depend on users running Impala's compute stats on the tables they use with Impala - I believe they get column stats missing warnings if they don't do this since Hive's analyze (I think) doesn't fill in all the column stats.

But yeah, we should consider what mix of documentation and code we use to solve this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 15:00:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4185/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 21:40:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Attila Jeges (Code Review)" <ge...@cloudera.org>.
Attila Jeges has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................

WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

This patch-set is in WIP state since HIVE-22046 and the HMS API
changes introduced by it are not yet available for Impala via
native-toolchain.

HIVE-22046 added 'engine' column to TAB_COL_STATS and PART_COL_STATS
HMS tables. The new column is used to differentiate among column stats
computed by different engines. The related HMS API calls were changed
accordingly.

This change sets 'engine' parameter to 'impala' in all the affected
HMS API calls.

For now this feature works with external tables only as COMPUTE STATS
is not yet supported for managed tables (see IMPALA-8836).

Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
---
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_hms_integration.py
7 files changed, 184 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/14032/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14032 )

Change subject: WIP: IMPALA-8842: Change column stat HMS calls to accommodate HIVE-22046.
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4186/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia782bda3891dee1281cfcd5cf8817aa1226217ad
Gerrit-Change-Number: 14032
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <at...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 21:46:12 +0000
Gerrit-HasComments: No