You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/10/30 21:15:18 UTC

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

John Russell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8418


Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................

IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
---
M docs/topics/impala_known_issues.xml
1 file changed, 43 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/8418/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 3: Code-Review+2

Sorry for the slow turnaround, working through my backlog.


-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 23:40:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Greg Rahn, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8418

to look at the new patch set (#2).

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................

IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
---
M docs/topics/impala_known_issues.xml
1 file changed, 45 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/8418/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@936
PS1, Line 936:             Examine the <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the
This unfortunately won't give accurate info for all queries: if the query isn't materialising any columns (e.g. count(*)) or the file is filtered out by runtime filters, the file compression was inaccurate in previous versions - see IMPALA-5311 and IMPALA-4863 and respectively.

One way to tell for sure is to run something like "select * from table" and then look. Or, say, "select min(string_col) from table"


http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@937
PS1, Line 937:             suspected table. Look for <q>File Formats</q>. A value containing <codeph>PARQUET/NONE</codeph>
It might be helpful to note common cases where uncompressed Parquet is/isn't created. Impala generates snappy-compressed Parquet by default unless compression_codec is changed. Most uncompressed parquet we see in the wild is generated by Hive or other non-Impala tools.



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Oct 2017 21:35:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
> I agree it's a little strange. But Cloudera issued a technical service bull
I think TSBs probably use a different scale though. Most of the medium severity issues here wouldn't warrant a TSB. This bug is definitely worse than the ABS bug below.



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:18:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
Missed this on the first pass - shouldn't this be high?



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:48:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
> Missed this on the first pass - shouldn't this be high?
I agree it's a little strange. But Cloudera issued a technical service bulletin (TSB) about the issue and it was only rated "medium" there. We are in somewhat new territory with how the TSB info relates to the upstream docs. My impulse was not to deviate greatly from whatever info came from the support group.



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 20:35:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
> I think TSBs probably use a different scale though. Most of the medium seve
Sure! Changed.


http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
> I agree it's a little strange. But Cloudera issued a technical service bull
Done


http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951
PS2, Line 951: Medium
> Missed this on the first pass - shouldn't this be high?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 22:21:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................

IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Reviewed-on: http://gerrit.cloudera.org:8080/8418
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_known_issues.xml
1 file changed, 45 insertions(+), 0 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@936
PS1, Line 936:             Examine the <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the
> This unfortunately won't give accurate info for all queries: if the query i
Done


http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@937
PS1, Line 937:             suspected table. Use a query that performs a full table scan, and materializes the column
> It might be helpful to note common cases where uncompressed Parquet is/isn'
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 20:37:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Nov 2017 02:15:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/177/


-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Nov 2017 02:07:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Greg Rahn, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8418

to look at the new patch set (#3).

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................

IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
---
M docs/topics/impala_known_issues.xml
1 file changed, 45 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/8418/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 1:

This was announced by Cloudera as TSB-225. Traditionally, TSBs get an equivalent Known Issue or similar in release notes. I think the best approach is to make a version-agnostic issue in upstream docs and be more explicit about fixed CDH maintenance releases in the equivalent downstream note.


-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Oct 2017 21:20:34 +0000
Gerrit-HasComments: No