You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Lars Volker (Code Review)" <ge...@cloudera.org> on 2016/06/23 21:14:28 UTC

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3474

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................

IMPALA-3776: fix 'describe formatted' for Avro tables

For Avro tables the column information in the underlying database of the
Hive metastore can be different from what is specified in the
'avro.schema.url' file. HIVE-6308 aimed to improve upon this, but for
older tables the two don't necessarily align.

There are two possible cases:

1) Hive's underlying database contains a column, which is not present in
the Avro schema file. In this case we encounter a NullPointerException
in DescribeResultFactory.java#L189 when trying to lookup the column in
the internal table object.

2) The avro schema contains a column, which is not present in the
underlying database. In this case the column will not be displayed in
describe formatted.

I don't know how to automatically test this, but I verified this
manually by creating an Avro table with an external schema file in Hive.
This populated the underlying database with the column information. I
then either removed a column from the Avro schema file (case 1) or
cleared the column information from the "COLUMNS_V2" table in the
underlying database (case 2) and verified that the change fixed both
cases.

Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
---
M fe/src/main/java/com/cloudera/impala/catalog/Column.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java
3 files changed, 29 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/74/3474/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Patch Set 2:

Moved to https://gerrit.cloudera.org/4126

-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has uploaded a new patch set (#2).

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................

IMPALA-3776: fix 'describe formatted' for Avro tables

For Avro tables the column information in the underlying database of the
Hive metastore can be different from what is specified in the
'avro.schema.url' file. HIVE-6308 aimed to improve upon this, but for
older tables the two don't necessarily align.

There are two possible cases:

1) Hive's underlying database contains a column which is not present in
the Avro schema file. In this case we encounter a NullPointerException
in DescribeResultFactory.java#L189 when trying to look up the column in
the internal table object.

2) The avro schema contains a column, which is not present in the
underlying database. In this case the column will not be displayed in
describe formatted.

I don't know how to automatically test this, but I verified this
manually by creating an Avro table with an external schema file in Hive.
This populated the underlying database with the column information. I
then either removed a column from the Avro schema file (case 1) or
cleared the column information from the "COLUMNS_V2" table in the
underlying database (case 2) and verified that the change fixed both
cases.

Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
---
M fe/src/main/java/com/cloudera/impala/catalog/Column.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java
3 files changed, 29 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/74/3474/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Patch Set 2: Code-Review-2

(6 comments)

Thanks for the review. I added tests and will push them directly to impala-asf. Sometime later I will abandon this change.

http://gerrit.cloudera.org:8080/#/c/3474/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-3776: fix 'describe formatted' for Avro tables
> For both cases below, is our output consistent with the regular describe (w
After this change, yes. Both commands eventually will call getNonClusteringColumns() and getClusteringColumns() in Table.java, although the code paths to there are very(!) different and should be unified in a subsequent change.


Line 11: 'avro.schema.url' file. HIVE-6308 aimed to improve upon this, but for
> Instead of 'avro.schema'url' let's just say Avro Schema because there are o
Done


Line 21: 2) The avro schema contains a column, which is not present in the
> Did this second case also have a bug that was fixed, or did it already work
This was also broken.


Line 25: I don't know how to automatically test this, but I verified this
> I think you might be able to test case 1) as follows:
Thank you for the hint, that led me in the right direction and I was able to test both cases. :)


http://gerrit.cloudera.org:8080/#/c/3474/2/fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java
File fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java:

Line 187:     // table.
> table which has already reconciled those differences.
Done


Line 189:     msTable.setPartitionKeys(Column.toFieldSchemas(table.getClusteringColumns()));
> Nice solution!
It was Bharath's idea. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Patch Set 2:

Please update to using the new gerrit project, "Impala-ASF".
Instructions are here:

https://cwiki.apache.org/confluence/display/IMPALA/How+to+switch+to+Apache-hosted+git

Pushes to this project will be disabled on October 1.

-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3474/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-3776: fix 'describe formatted' for Avro tables
For both cases below, is our output consistent with the regular describe (without formatted)?


Line 11: 'avro.schema.url' file. HIVE-6308 aimed to improve upon this, but for
Instead of 'avro.schema'url' let's just say Avro Schema because there are other ways of specifying an Avro Schema (e.g. avro.schema.literal)


Line 21: 2) The avro schema contains a column, which is not present in the
Did this second case also have a bug that was fixed, or did it already work?


Line 25: I don't know how to automatically test this, but I verified this
I think you might be able to test case 1) as follows:
- create an Avro table only with column definitions (no Avro schema)
- alter the table to give it an Avro schema that has fewer columns (use avro.schema.litera; for convenience)
- invalidate metadata on the table
- run describe formatted


http://gerrit.cloudera.org:8080/#/c/3474/2/fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java
File fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java:

Line 187:     // table.
table which has already reconciled those differences.


Line 189:     msTable.setPartitionKeys(Column.toFieldSchemas(table.getClusteringColumns()));
Nice solution!


-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Patch Set 1:

(3 comments)

this does need a test.

http://gerrit.cloudera.org:8080/#/c/3474/1//COMMIT_MSG
Commit Message:

Line 16: 1) Hive's underlying database contains a column, which is not present in
"a column which", unless you mean there's only a single column


Line 18: in DescribeResultFactory.java#L189 when trying to lookup the column in
to look up


http://gerrit.cloudera.org:8080/#/c/3474/1/fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java
File fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java:

Line 187
what was the rationale for the logic below? (and why don't we need it anymore?)


-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has abandoned this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Abandoned

Moved and already merged

-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3776: fix 'describe formatted' for Avro tables

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-3776: fix 'describe formatted' for Avro tables
......................................................................


Patch Set 1:

(3 comments)

Thanks for the comments. I don't know how to automatically test this. For the manual tests I created an avro table and then updated the schema file for case 1 and deleted some entries in COLUMNS_V2 in the underlying postgres database for case 2. This is how we tested https://gerrit.cloudera.org/#/c/3446/, too.

If you have an idea how to automatically test this, please let me know.

http://gerrit.cloudera.org:8080/#/c/3474/1//COMMIT_MSG
Commit Message:

Line 16: 1) Hive's underlying database contains a column, which is not present in
> "a column which", unless you mean there's only a single column
Done


Line 18: in DescribeResultFactory.java#L189 when trying to lookup the column in
> to look up
Done


http://gerrit.cloudera.org:8080/#/c/3474/1/fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java
File fe/src/main/java/com/cloudera/impala/service/DescribeResultFactory.java:

Line 187
> what was the rationale for the logic below? (and why don't we need it anymo
The rational seems to have been that we transfer all comments from our internal table object's column list to the columns retrieved from the metastore. Since we now retrieve the columns from our internal table object altogether, including the comments, we don't need to copy them separately.


-- 
To view, visit http://gerrit.cloudera.org:8080/3474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic640f18acf7a1731f34b22c50ebbb462dfee78bd
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes