You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2021/02/20 16:46:12 UTC

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

scarlin@cloudera.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17093


Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................

IMPALA-10525: Add param to BuiltinsDb to defer initialization

BuiltinsDb currently initializes all the builtin functions on
initialization. Part of the initialization task is to interact with
the C++ code to fetch the signatures of the functions.  This doesn't
work if a third party extension wants to use the BuiltinDb but does
not have access to the C++ library at runtime.

The solution is to add a parameter to the initialization which will
allow the initialization to be deferred.

Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
---
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
1 file changed, 10 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17093/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <sc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@81
PS1, Line 81:   public static synchronized Db getInstance(boolean initBuiltins) {
Should we make 'initBuiltins' a static field so if someone call both getInstance(true) and getInstance(false) we can add Precondition checks to forbid it?


http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@99
PS1, Line 99: private
initBuiltins() is private. Will it be called after we init the singleton with initBuiltins=false?

It seems so since the commit message mentions "allow the initialization to be deferred".
If not, maybe we should rename 'initBuiltins' to something like 'withoutBuiltinFuncs' to be more specifit.



-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 1
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 21 Feb 2021 09:02:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8178/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 1
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Feb 2021 17:21:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Steve Carlin (Code Review)" <ge...@cloudera.org>.
Steve Carlin has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................

IMPALA-10525: Add param to BuiltinsDb to defer initialization

BuiltinsDb currently initializes all the builtin functions on
initialization. Part of the initialization task is to interact with
the C++ code to fetch the signatures of the functions.  This doesn't
work if a third party extension wants to use the BuiltinsDb but does
not have access to the C++ library at runtime.

The solution is to add an alternative way to initalize the BuiltinsDb,
through a Loader class.

Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
---
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
1 file changed, 15 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/17093/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 2
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6913/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 3
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Feb 2021 02:53:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 3
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Feb 2021 02:53:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 3
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Feb 2021 08:41:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Steve Carlin (Code Review)" <ge...@cloudera.org>.
Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 1:

(3 comments)

For the most part, the BuiltIn instance will have the exact same contents. In my implementation though, I created a static file containing all the dirty details, like the C++ signatures.  After the Builtin instance is created, the "BuiltInDb.getInstance().addFunction is called for all the built in functions.

I went with this implementation because I was looking for minimal impact on the Impala code base.  I could go for something a bit more complicated if this doesn't meet code review requirements.

If we're concerned about immutability, one potential solution is to have a Loader interface that loads in the BuiltinDb the way an external user of the class prefers.  I think it would have to be a singleton as well and look like this:

private static volatile Loader loader;
public static setLoader(Loader loader) {
  this.loader = loader;
}

...and then in the Builtins constructor, call loader.initBuiltins(this);

The Impala version would have a BuiltinLoader class with the "initBuiltins()" method that contains all the static initializations.

Would that be more to your liking?

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@81
PS1, Line 81:   public static synchronized Db getInstance(boolean initBuiltins) {
> Should we make 'initBuiltins' a static field so if someone call both getIns
See general comment for perhaps a better way to implement.  

But if I stick with this, the thing is that in my implementation, it's really only slightly deferred.  The parameter only matters when the singleton hasn't been initialized.  After that, it doesn't matter if you call it with true or false.


http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@91
PS1, Line 91: if (initBuiltins) {
            :       initBuiltins();
            :     }
> This seems disable the init of all builtins. My concern is the following:
See general comment for perhaps a better way to implement.

But to address these comments:

The third party tool is responsible for all the initializations.  There's no mix.  And the 3rd party should match the functions that Impala uses to populate.

There are future plans to make sure that the Impala initialization matches the 3rd party initialization, but that is a bigger rewrite, and beyond the scope of what can be developed at this point.


http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@99
PS1, Line 99: private
> initBuiltins() is private. Will it be called after we init the singleton wi
See general comment for perhaps a better way to implement.

But to address this comment: 

The third part is using BuiltinDb.addFunction() to populate the functions. 

I can address the rename, will do so after discussing the other proposal I mentioned on the top level.



-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 1
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Feb 2021 17:08:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 2
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Feb 2021 02:52:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8199/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 2
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Feb 2021 02:59:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................

IMPALA-10525: Add param to BuiltinsDb to defer initialization

BuiltinsDb currently initializes all the builtin functions on
initialization. Part of the initialization task is to interact with
the C++ code to fetch the signatures of the functions.  This doesn't
work if a third party extension wants to use the BuiltinsDb but does
not have access to the C++ library at runtime.

The solution is to add an alternative way to initalize the BuiltinsDb,
through a Loader class.

Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Reviewed-on: http://gerrit.cloudera.org:8080/17093
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
1 file changed, 15 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 4
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 1:

(1 comment)

Sounds like a good idea to have a Loader and use it to verify that the content from the 3rd party tool is exactly the same as what is expected by Impala FE and all the way to BE.

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@91
PS1, Line 91: if (initBuiltins) {
            :       initBuiltins();
            :     }
> See general comment for perhaps a better way to implement.
Perhaps that a verifier should be coded to make sure that the 3rd party tools produce exact the same items as FE currently does.



-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 1
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Steve Carlin <sc...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Feb 2021 17:49:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10525: Add param to BuiltinsDb to defer initialization

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17093 )

Change subject: IMPALA-10525: Add param to BuiltinsDb to defer initialization
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/17093/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@91
PS1, Line 91: if (initBuiltins) {
            :       initBuiltins();
            :     }
This seems disable the init of all builtins. My concern is the following:
1. The 3rd party tools only provide some coverage in functionality, and relies on Impala's for the rest. If so, a reduced scope in bypassing the initBuiltins(this) calls would be better. 
2. FE uses the builtIn functions to process a query, such as one containing NDV() function. So disabling these builtins entirely have an impact. An alternative approach would be to just disable the loading of the C++ signature part, which makes it more inline with the requirement to manage 3rd party tools that do not have the C++ implementations.



-- 
To view, visit http://gerrit.cloudera.org:8080/17093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1941a2494efe15b63514d849873eeb1c4ed8a981
Gerrit-Change-Number: 17093
Gerrit-PatchSet: 1
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Feb 2021 14:00:39 +0000
Gerrit-HasComments: Yes