You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2019/04/30 14:22:13 UTC

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13190


Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_explain.py
5 files changed, 78 insertions(+), 50 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13190/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13190

to look at the new patch set (#5).

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- fixed a similar test, test_explain_validate_cardinality_estimates
  (the format of the line it looks for has changed, which lead to
  skipping the actual verification and accepting everything)
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_explain.py
5 files changed, 80 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13190/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3002/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Apr 2019 15:16:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13190/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/13190/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@469
PS2, Line 469:   // Note that this test used to be flaky, so it was removed (6 years ago). I added it
> Yeah I think we'd have noticed if stats were disappearing in our Jenkin's r
Done


http://gerrit.cloudera.org:8080/#/c/13190/2/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

http://gerrit.cloudera.org:8080/#/c/13190/2/tests/metadata/test_explain.py@141
PS2, Line 141:   def test_explain_row_size_estimates(self, vector, unique_database):
> My initial reaction was that this should be a planner test. I can see, thou
I agree that it would be better to check this among planner tests - if the planner acts in an unexpected way, then it is the best to catch it in planner tests. My laziness says that it has a nice place near the similar test_explain_validate_cardinality_estimates.

There was a benefit in doing it here - it made me realize that test_explain_validate_cardinality_estimates is totally broken, it accepts any cardinality as its regexp no longer matches the lines it should.


http://gerrit.cloudera.org:8080/#/c/13190/2/tests/metadata/test_explain.py@142
PS2, Line 142:     # Tests that EXPLAIN returns the expected row sizes with and without stats.
> Should use a """ """ docstring for method comments
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 14:31:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13190

to look at the new patch set (#3).

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- fixed a similar test, test_explain_validate_cardinality_estimates
  (the format of the line it looks for has changed, which lead to
  skipping the actual verification and accepting everything)
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_explain.py
5 files changed, 85 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13190/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3036/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 15:01:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- fixed a similar test, test_explain_validate_cardinality_estimates
  (the format of the line it looks for has changed, which lead to
  skipping the actual verification and accepting everything)
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Reviewed-on: http://gerrit.cloudera.org:8080/13190
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_explain.py
5 files changed, 80 insertions(+), 50 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3041/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 19:23:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13190/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/13190/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@469
PS2, Line 469:   // Note that this test used to be flaky, so it was removed (6 years ago). I added it
Yeah I think we'd have noticed if stats were disappearing in our Jenkin's runs. Could just remove this comment.


http://gerrit.cloudera.org:8080/#/c/13190/2/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

http://gerrit.cloudera.org:8080/#/c/13190/2/tests/metadata/test_explain.py@141
PS2, Line 141:   def test_explain_row_size_estimates(self, vector, unique_database):
My initial reaction was that this should be a planner test. I can see, though, that there's some value in having an end-to-end test like this since it does actually test that the stats are handled correctly end-to-end from compute stats to the planner. Can you add a sentence to the method comment to justify why this should be an end-to-end test. We don't want future people blindly copying this instead of writing a planner test.


http://gerrit.cloudera.org:8080/#/c/13190/2/tests/metadata/test_explain.py@142
PS2, Line 142:     # Tests that EXPLAIN returns the expected row sizes with and without stats.
Should use a """ """ docstring for method comments



-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Apr 2019 16:02:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13190/4/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

http://gerrit.cloudera.org:8080/#/c/13190/4/tests/metadata/test_explain.py@87
PS4, Line 87:     assert found_match
Can you make sure these asserts print the output with the missing string? Makes it easier to debug if it fails in future.

  assert found_match, query_result



-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 17:35:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13190

to look at the new patch set (#4).

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- fixed a similar test, test_explain_validate_cardinality_estimates
  (the format of the line it looks for has changed, which lead to
  skipping the actual verification and accepting everything)
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_explain.py
5 files changed, 80 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13190/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 18:40:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13190

to look at the new patch set (#2).

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
---
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M tests/metadata/test_explain.py
5 files changed, 78 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/13190/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 23:58:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3037/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 15:11:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3001/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Apr 2019 15:08:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

Carry +2 from Tim.

http://gerrit.cloudera.org:8080/#/c/13190/4/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

http://gerrit.cloudera.org:8080/#/c/13190/4/tests/metadata/test_explain.py@87
PS4, Line 87:     assert found_match, query_result
> Can you make sure these asserts print the output with the missing string? M
Thanks, good idea!



-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 18:39:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8409: Fix row-size for STRING columns with unknown stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13190 )

Change subject: IMPALA-8409: Fix row-size for STRING columns with unknown stats
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4143/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/13190
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Gerrit-Change-Number: 13190
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 May 2019 18:40:29 +0000
Gerrit-HasComments: No