You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Russell (Code Review)" <ge...@cloudera.org> on 2017/06/02 23:01:12 UTC

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7068

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................

IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Just putting an initial stake in the ground. If examples,
details of Hive interoperability, or type-by-type details
are needed, I prefer to handle those in followup gerrits.

Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
---
M docs/topics/impala_parquet.xml
1 file changed, 18 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7068/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

LGTM, only nits.

http://gerrit.cloudera.org:8080/#/c/7068/2/docs/topics/impala_parquet.xml
File docs/topics/impala_parquet.xml:

PS2, Line 367: include
nit: includes


Line 369:         a particular Parquet file has a minimum value of 1 and a maximum value of 100, then 
nit: trailing whitespace


-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................

IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Just putting an initial stake in the ground. If examples,
details of Hive interoperability, or type-by-type details
are needed, I prefer to handle those in followup gerrits.

Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
---
M docs/topics/impala_parquet.xml
1 file changed, 19 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7068/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7068/2/docs/topics/impala_parquet.xml
File docs/topics/impala_parquet.xml:

PS2, Line 367: include
> nit: includes
Done


Line 369:         a particular Parquet file has a minimum value of 1 and a maximum value of 100, then 
> nit: trailing whitespace
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Just putting an initial stake in the ground. If examples,
details of Hive interoperability, or type-by-type details
are needed, I prefer to handle those in followup gerrits.

Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Reviewed-on: http://gerrit.cloudera.org:8080/7068
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_parquet.xml
1 file changed, 19 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/131/

-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7068

to look at the new patch set (#3).

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................

IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Just putting an initial stake in the ground. If examples,
details of Hive interoperability, or type-by-type details
are needed, I prefer to handle those in followup gerrits.

Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
---
M docs/topics/impala_parquet.xml
1 file changed, 19 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7068/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "John Russell (Code Review)" <ge...@cloudera.org>.
John Russell has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7068/1/docs/topics/impala_parquet.xml
File docs/topics/impala_parquet.xml:

PS1, Line 363: row group 
> Not sure what "data block" means. "each row group and data page" would be m
Done


PS1, Line 366: when reading eac
> "parts of each file", because it could be a data page or row group.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7068/1/docs/topics/impala_parquet.xml
File docs/topics/impala_parquet.xml:

PS1, Line 363: data block
Not sure what "data block" means. "each row group and data page" would be more precise.

I feel like the current text may confuse readers about what is in Parquet files in general versus how Impala writes out files versus what Impala actually makes use of on the read path right now.

Currently both Impala and other tools write out stats at both the row group and data page level. The data pages are a smaller granularity. Row groups are much larger granularity. I think the salient fact there is that there are typically a small number of row groups per file (1 for Impala).

Impala currently only uses the row group-level statistics to skip over large parts of the file at a time, but we have plans to use the page-level statistics.


PS1, Line 366: whether the file
"parts of each file", because it could be a data page or row group.


-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: John Russell <jr...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No