You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2022/08/03 09:21:50 UTC

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18811


Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................

IMPALA-11469: Make prefix of ignored staging dirs configurable

External systems like Hive or Spark will write temporary or "non-data"
files in the table location. Catalogd will skip them when loading file
metadata. However, the prefix is currently hard coded. We recently found
that Spark streaming will generated a _spark_metadata dir which is not
handled correctly.

To avoid future code changes when interact with more systems, this patch
adds a new startup flag, ignored_dir_prefix_list, for catalogd. It's a
comma separated list for the prefix of ignored dirs. Currently, the
default value is ".,_tmp.,_spark_metadata". Users can add more in the
future.

Tests:
 - Add a case for _spark_metadata in FileSystemUtilTest

Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 50 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/18811/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................

IMPALA-11469: Make prefix of ignored staging dirs configurable

External systems like Hive or Spark will write temporary or "non-data"
files in the table location. Catalogd will skip them when loading file
metadata. However, the prefix is currently hard coded. We recently found
that Spark streaming will generated a _spark_metadata dir which is not
handled correctly.

To avoid future code changes when interact with more systems, this patch
adds a new startup flag, ignored_dir_prefix_list, for catalogd. It's a
comma separated list for the prefix of ignored dirs. Currently, the
default value is ".,_tmp.,_spark_metadata". Users can add more in the
future.

Tests:
 - Add a case for _spark_metadata in FileSystemUtilTest

Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Reviewed-on: http://gerrit.cloudera.org:8080/18811
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 49 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8391/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 12:07:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 3:

> Patch Set 3: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8396/

The failure is unrelated to this patch: IMPALA-11352.


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 11:12:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8398/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 11:12:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 2: Code-Review+2

This makes sense to me


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 00:44:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8396/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 02:21:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 02:21:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11083/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 09:43:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 15:41:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 16:57:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8396/


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 07:19:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18811

to look at the new patch set (#2).

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................

IMPALA-11469: Make prefix of ignored staging dirs configurable

External systems like Hive or Spark will write temporary or "non-data"
files in the table location. Catalogd will skip them when loading file
metadata. However, the prefix is currently hard coded. We recently found
that Spark streaming will generated a _spark_metadata dir which is not
handled correctly.

To avoid future code changes when interact with more systems, this patch
adds a new startup flag, ignored_dir_prefix_list, for catalogd. It's a
comma separated list for the prefix of ignored dirs. Currently, the
default value is ".,_tmp.,_spark_metadata". Users can add more in the
future.

Tests:
 - Add a case for _spark_metadata in FileSystemUtilTest

Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
5 files changed, 49 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/18811/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 16:02:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11469: Make prefix of ignored staging dirs configurable

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18811 )

Change subject: IMPALA-11469: Make prefix of ignored staging dirs configurable
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11082/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I108bfa823281a35d28932f7ccce0b12a0c5af57d
Gerrit-Change-Number: 18811
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 09:43:01 +0000
Gerrit-HasComments: No