You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Xianqing He (Code Review)" <ge...@cloudera.org> on 2021/07/16 08:16:41 UTC

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Xianqing He has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17688


Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when check if the column
is ambiguous.

The fix is to use LinkedHashMap to search and check if we have already
seen the alias.

Testing:
Performance testing with a query with 10000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to 2.5 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
1 file changed, 6 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17688/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 1
Gerrit-Owner: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 02:03:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 1: Code-Review+1

(2 comments)

Good find. 10K exprs in a select list is certainly pushing the boundaries.

http://gerrit.cloudera.org:8080/#/c/17688/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17688/1//COMMIT_MSG@11
PS1, Line 11: check
nit: 'checking'


http://gerrit.cloudera.org:8080/#/c/17688/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17688/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@331
PS1, Line 331:       Map<String, Expr> existingAliasExprs = new LinkedHashMap<>();
nit: pls add a comment that this additional map is used for performance reasons and not for finding ambiguous alias.



-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 1
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Jul 2021 18:21:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9108/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sat, 17 Jul 2021 02:54:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17688/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/17688/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@a271
PS4, Line 271: 
Sorry that I think your comment makes more sense: https://issues.apache.org/jira/browse/IMPALA-10799?focusedCommentId=17382425&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17382425

Let's move this line into the above if-clause instead of removing it.



-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 4
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sun, 18 Jul 2021 01:46:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when checking if the column
is ambiguous.

The fix is to use LinkedHashMap to search and check if we have already
seen the alias.

Testing:
Performance testing with a query with 10000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to 2.5 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
1 file changed, 6 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17688/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7306/


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 1
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Jul 2021 14:35:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when checking if the column
is ambiguous.

The fix is to
1.Use LinkedHashMap to search and check if we have already seen the alias.
2.Remove the check of checkComposedFrom() since the codes have been mature
for a while

Testing:
Performance testing with a query with 10000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to less than 1 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
---
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
2 files changed, 8 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17688/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 4
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17688/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/17688/4/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@a271
PS4, Line 271: 
> Sorry that I think your comment makes more sense: https://issues.apache.org
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 02:51:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7318/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 02:03:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17688/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17688/1//COMMIT_MSG@11
PS1, Line 11: check
> nit: 'checking'
Done


http://gerrit.cloudera.org:8080/#/c/17688/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17688/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@331
PS1, Line 331:       // This additional map is used for performance reasons and not for finding
> nit: pls add a comment that this additional map is used for performance rea
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sat, 17 Jul 2021 02:27:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when checking if the column
is ambiguous.

The fix is to
1.Use LinkedHashMap to search and check if we have already seen the alias.
2.Do the check of checkComposedFrom() when the log level is TRACE since
the codes have been mature for a while.

Testing:
Performance testing with a query with 10000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to less than 1 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
---
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
2 files changed, 9 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17688/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9102/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 1
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Jul 2021 08:44:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 5: Code-Review+2

LGTM. Bumping Aman's +1 and mine to +2. Thank Xianqing!


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 02:02:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9107/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sat, 17 Jul 2021 02:52:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when checking if the column
is ambiguous.

The fix is to use LinkedHashMap to search and check if we have already
seen the alias.

Testing:
Performance testing with a query with 10000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to 2.5 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
1 file changed, 8 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17688/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 08:22:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7306/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 1
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Jul 2021 08:18:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9112/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 03:04:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9109/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 4
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sat, 17 Jul 2021 06:37:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10799: Analysis slowdown with inline views and thousands of column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17688 )

Change subject: IMPALA-10799: Analysis slowdown with inline views and thousands of column
......................................................................

IMPALA-10799: Analysis slowdown with inline views and thousands of column

If there are thousands of columns in the inlineview, it‘s very slow in
analysis. Most of the cost is in the get() calls used to find
expressions in the local substitution map when checking if the column
is ambiguous.

The fix is to
1.Use LinkedHashMap to search and check if we have already seen the alias.
2.Do the check of checkComposedFrom() when the log level is TRACE since
the codes have been mature for a while.

Testing:
Performance testing with a query with 10000 expressions of the
following form:
  with a as (select c1 c1, c1 c2, c1 c3, ... from t)
  select c1, c2, c3, ... from a;
repro query analysis went from 7.5 sec to less than 1 sec.

Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Reviewed-on: http://gerrit.cloudera.org:8080/17688
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
2 files changed, 9 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I43da47dddfdb3db6d0e2073ae974a0a4d1b3ad7c
Gerrit-Change-Number: 17688
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>