You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2019/04/15 23:50:45 UTC

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13020


Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  destroy and recreate this table. The table should be in the same
  region as the S3_BUCKET for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to destroy/create the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
---
M bin/impala-config.sh
A bin/jenkins/release_cloud_resources.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
D tests/util/abfs_util.py
M tests/util/filesystem_utils.py
M tests/util/hdfs_util.py
8 files changed, 190 insertions(+), 117 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3309/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 May 2019 18:37:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179
PS5, Line 179:     check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location])
> I saw a failure here where s3guard showed the temporary file used for copyi
Makes sense. I'm wondering if we could add the -d option just for S3 (and do it for all calls to copyFromLocal). For HDFS, I think it makes sense to run without -d, as it makes the upload atomic. For S3, I think 99% of the time, you should run with -d. I don't think there is much benefit to writing data to a temp file on S3, and then re-copying it to the final file, plus I think S3 uploads are already atomic.

Not a blocker, so we could do this in a follow up JIRA, just a thought.

Yeah, I think your explanation is correct - e.g. you could write a file from one impalad, and S3Guard will store an entry for that file in DynamoDB, then another impalad could try to read that file, and S3Guard will indicate the file exists, but since S3 is eventually consistent the file might not appear yet.



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 May 2019 21:27:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 4:

(6 comments)

mostly minor comments, overall LGTM pending a few comments

http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh@341
PS4, Line 341:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore"
Is this necessary? I think the default impl uses the NullMetadataStore already, right?


http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh@67
PS4, Line 67:       hadoop s3guard prune -seconds 1 -meta "dynamodb://${S3GUARD_DYNAMODB_TABLE}" \
is it necessary to prune a newly initialized S3Guard table?


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py@a175
PS4, Line 175: 
Should we delete the S3Client as well? It doesn't look like it is used anywhere else, right? We can remove the boto3 dependency as well, which decreases the number of Python dependencies we have.


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@158
PS4, Line 158:   def __init__(self, filesystem_type="HDFS"):
the filesystem_type is just purely for debug messages right? might be useful to mention that


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@163
PS4, Line 163:     hadoop_command = ['hadoop', 'fs'] + command
nit: hdfs instead of hadoop


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@170
PS4, Line 170:   def create_file(self, path, file_data, overwrite=True):
nit: would be nice to have some docs for these methods, e.g. mention that create_file actually writes to a temp file on the local fs first and then uploads the file to the target fs



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 May 2019 20:41:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh
File bin/jenkins/release_cloud_resources.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@29
PS2, Line 29: # This is currently only implemented for s3.
> Consider asserting S3GUARD_DYNAMODB_TABLE and S3_BUCKET are non-empty...
Added tests for S3_BUCKET, S3GUARD_DYNAMODB_TABLE, and S3GUARD_DYNAMODB_REGION.


http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@37
PS2, Line 37:   aws s3 rm --recursive --quiet s3://${S3_BUCKET}${TEST_WAREHOUSE_DIR}
> This line and line 33 is useful enough that we should either do set +x or e
Added echo statements for these two.


http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh@62
PS2, Line 62:     if [[ "${S3GUARD_ENABLED}" = "true" ]]; then
> May be useful to verify that S3GUARD_DYNAMODB_TABLE and S3GUARD_DYNAMODB_RE
I added code to bin/impala-config.sh to require those to be set. This file sources that, so I'm relying on bin/impala-config.sh to check it.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@177
PS2, Line 177:         cls.filesystem_client = S3Client(S3_BUCKET_NAME)
> I think we could coalesce this to use HDFS/Hadoop always, thereby maybe rem
I think that makes sense. Switched this to use HdfsCommandLineClient("S3") unconditionally.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@178
PS2, Line 178:     elif IS_ABFS:
             :       # ABFS is implemented via HDFS command line client
             :       cls.filesystem_client = HdfsCommandLineClient("ABFS")
             :     elif IS_ADLS:
             :       cls.filesystem_client = ADLSClient(ADLS_STORE_NAME)
> Do you know which, if any, of these are getting tested these days?
Neither are routinely tested.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@149
PS2, Line 149: # This client is a wrapper around the hdfs command line. This is useful for filesystems
> Use pydoc?
Done


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@170
PS2, Line 170:     f = tempfile.NamedTemporaryFile(delete=False)
> Caveat: I commented and then realized this class is just being moved, so ta
Good point, that is cleaner


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@181
PS2, Line 181:     return True
> Caveat: I commented and then realized this class is just being moved, so ta
Yeah, it seems like we should care about the return code of the hadoop call.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@213
PS2, Line 213:       fname = f['name'].split("/")[-1]
> Caveat: I commented and then realized this class is just being moved, so ta
Switched this to use os.path.basename().



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 May 2019 21:27:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3201/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 May 2019 22:12:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/1/bin/impala-config.sh@310
PS1, Line 310:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore"
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 23:52:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/testdata/bin/load-test-warehouse-snapshot.sh@62
PS2, Line 62:     if [[ "${S3GUARD_ENABLED}" = "true" ]]; then
May be useful to verify that S3GUARD_DYNAMODB_TABLE and S3GUARD_DYNAMODB_REGION are not empty if S3GUARD_ENABLED is "true" ?



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 20:24:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@170
PS2, Line 170:     f = tempfile.NamedTemporaryFile(delete=False)
Caveat: I commented and then realized this class is just being moved, so take this as non-blocking.

Use a context manager?

  with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
      tmp_file.write(file_data)

That calls close() for you automatically. You can later still get tmp_file.name on a closed file handle, so no need for tmp_path.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@181
PS2, Line 181:     return True
Caveat: I commented and then realized this class is just being moved, so take this as non-blocking.

Maybe do the same thing as the previous method?

  (status, stdout, stderr) = self._hadoop_fs_shell(['-mkdir', '-p', fixed_path])
  return status == 0


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@213
PS2, Line 213:       fname = f['name'].split("/")[-1]
Caveat: I commented and then realized this class is just being moved, so take this as non-blocking.

Don't if it helps in this case, but the python os lib has a lot of path manipulation helpers, e.g.:

  >>> import os
  >>> f = 'foo/bar/baz'
  >>> os.path.dirname(f)
  'foo/bar'
  >>> os.path.basename(f)
  'baz'
  >>> os.path.abspath(f)
  '/<pwd>/foo/bar/baz'



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 20:10:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5:

The config template changes look fine to me


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 21 May 2019 21:43:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 6: Code-Review+2

Should we file a JIRA for applying the "-d" change in bulk? Otherwise +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 23 May 2019 18:23:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Laszlo Gaal, Philip Zeyliger, David Knupp, Sahil Takiar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13020

to look at the new patch set (#5).

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
---
M bin/generate_xml_config.py
M bin/impala-config.sh
A bin/jenkins/release_cloud_resources.sh
M infra/python/deps/requirements.txt
M testdata/bin/load-test-warehouse-snapshot.sh
A testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
D testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners_fuzz.py
D tests/util/abfs_util.py
M tests/util/filesystem_utils.py
M tests/util/hdfs_util.py
D tests/util/s3_util.py
13 files changed, 316 insertions(+), 373 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 6:

Rebase to get the drop of hwx.public.repo.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 May 2019 23:04:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 May 2019 22:19:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179
PS5, Line 179:     check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location])
> Makes sense. I'm wondering if we could add the -d option just for S3 (and d
Within a single test, we are mostly single threaded in terms of accessing tables where we upload files. That's the main reason I'm ok with HDFS having "-d", but since there is no correctness or performance penalty, we could leave off the "-d" for HDFS.

I agree that s3 mostly has no reason not to use "-d". I might look into a bulk change for that.



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 21 May 2019 21:55:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2792/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 00:14:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh@310
PS2, Line 310:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore"
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Apr 2019 20:02:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13020/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/3/bin/impala-config.sh@339
PS3, Line 339:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore"
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py@49
PS3, Line 49: CORE_CONF = HdfsConfig(os.path.join(environ['HADOOP_CONF_DIR'], "core-site.xml"))
flake8: E305 expected 2 blank lines after class or function definition, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 May 2019 21:28:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179
PS5, Line 179:     check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location])
> what was this change for?
I saw a failure here where s3guard showed the temporary file used for copying existed, but s3 still couldn't access it. I think s3guard gets us past the metadata consistency, but the actual storage is still eventually consistent? So, I'm getting rid of the extra copy and having hdfs write the file directly. I saw this once, and otherwise, builds have been stable.



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 May 2019 21:18:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4295/


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 May 2019 03:14:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Laszlo Gaal, Philip Zeyliger, David Knupp, Sahil Takiar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13020

to look at the new patch set (#3).

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
---
M bin/impala-config.sh
A bin/jenkins/release_cloud_resources.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
D tests/util/abfs_util.py
M tests/util/filesystem_utils.py
M tests/util/hdfs_util.py
8 files changed, 208 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh@341
PS4, Line 341:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore"
> Is this necessary? I think the default impl uses the NullMetadataStore alre
Changed core-site.xml to use the python templates and pushed this in there.


http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/13020/4/testdata/bin/load-test-warehouse-snapshot.sh@67
PS4, Line 67:       hadoop s3guard prune -seconds 1 -meta "dynamodb://${S3GUARD_DYNAMODB_TABLE}" \
> is it necessary to prune a newly initialized S3Guard table?
From what I understand, hadoop s3guard init doesn't clear out a DynamoDB table if it already exists. If somehow a test configuration aborts without calling bin/jenkins/release_cloud_resources.sh (e.g. a Jenkins reboot), I prune this to clear about anything left over from previous runs.


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/13020/4/tests/common/impala_test_suite.py@a175
PS4, Line 175: 
> Should we delete the S3Client as well? It doesn't look like it is used anyw
Good point, we can always revive it if we need it later. I removed it. I deferrred removing boto3 (but added a comment), because I think we are going to go through a cleanup when switching to toolchain python 2.7.


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@158
PS4, Line 158:   def __init__(self, filesystem_type="HDFS"):
> the filesystem_type is just purely for debug messages right? might be usefu
Added comment


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@163
PS4, Line 163:     hadoop_command = ['hadoop', 'fs'] + command
> nit: hdfs instead of hadoop
I renamed this class to the HadoopFsCommandLineClient. If there is a benefit to the hdfs command, we can switch, but I treat hadoop fs and hdfs dfs as mostly interchangeable, and this class is mainly for non-HDFS filesystems.


http://gerrit.cloudera.org:8080/#/c/13020/4/tests/util/hdfs_util.py@170
PS4, Line 170:   def create_file(self, path, file_data, overwrite=True):
> nit: would be nice to have some docs for these methods, e.g. mention that c
Added some function comments.



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 May 2019 17:46:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 23 May 2019 04:41:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13020

to look at the new patch set (#2).

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
---
M bin/impala-config.sh
A bin/jenkins/release_cloud_resources.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
D tests/util/abfs_util.py
M tests/util/filesystem_utils.py
M tests/util/hdfs_util.py
8 files changed, 185 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4301/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 May 2019 23:05:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh@312
PS2, Line 312:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore"
I looked into whether there were other interesting impls here and couldn't find any. hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java doesn't work across processes. If we had something that went cross-process, we wouldn't really need Dynamo in our world.


http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh
File bin/jenkins/release_cloud_resources.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@29
PS2, Line 29: # This is currently only implemented for s3.
Consider asserting S3GUARD_DYNAMODB_TABLE and S3_BUCKET are non-empty...


http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@37
PS2, Line 37:   aws s3 rm --recursive --quiet s3://${S3_BUCKET}${TEST_WAREHOUSE_DIR}
This line and line 33 is useful enough that we should either do set +x or echo what we're about to do.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@177
PS2, Line 177:         cls.filesystem_client = S3Client(S3_BUCKET_NAME)
I think we could coalesce this to use HDFS/Hadoop always, thereby maybe removing the S3Client code?


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@178
PS2, Line 178:     elif IS_ABFS:
             :       # ABFS is implemented via HDFS command line client
             :       cls.filesystem_client = HdfsCommandLineClient("ABFS")
             :     elif IS_ADLS:
             :       cls.filesystem_client = ADLSClient(ADLS_STORE_NAME)
Do you know which, if any, of these are getting tested these days?


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@149
PS2, Line 149: # This client is a wrapper around the hdfs command line. This is useful for filesystems
Use pydoc?

class Foo(bla):
  """bla bla bla"""



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:15:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Reviewed-on: http://gerrit.cloudera.org:8080/13020
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Sahil Takiar <st...@cloudera.com>
---
M bin/generate_xml_config.py
M bin/impala-config.sh
A bin/jenkins/release_cloud_resources.sh
M infra/python/deps/requirements.txt
M testdata/bin/load-test-warehouse-snapshot.sh
A testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
D testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
M tests/query_test/test_scanners_fuzz.py
D tests/util/abfs_util.py
M tests/util/filesystem_utils.py
M tests/util/hdfs_util.py
D tests/util/s3_util.py
13 files changed, 316 insertions(+), 373 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, but someone else must approve
  Impala Public Jenkins: Verified
  Sahil Takiar: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Laszlo Gaal, Philip Zeyliger, David Knupp, Sahil Takiar, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13020

to look at the new patch set (#4).

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
---
M bin/impala-config.sh
A bin/jenkins/release_cloud_resources.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
D tests/util/abfs_util.py
M tests/util/filesystem_utils.py
M tests/util/hdfs_util.py
8 files changed, 209 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/13020/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 May 2019 18:31:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/3/tests/util/hdfs_util.py@49
PS3, Line 49: CORE_CONF = HdfsConfig(os.path.join(environ['HADOOP_CONF_DIR'], "core-site.xml"))
> flake8: E305 expected 2 blank lines after class or function definition, fou
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 May 2019 21:29:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/4/bin/impala-config.sh@339
PS4, Line 339:   export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore"
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 May 2019 21:30:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 6: Code-Review+1

Carry +1


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 May 2019 23:04:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4295/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 21 May 2019 21:55:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2851/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Apr 2019 20:45:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

LGTM +1

I like the idea of using a Python script to generate core-site.xml rather than using a .tmpl file. I think it would be good if someone else more familiar with the mini-cluster setup took a look as well though, since this is a new change.

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/13020/5/tests/query_test/test_scanners_fuzz.py@179
PS5, Line 179:     check_call(['hdfs', 'dfs', '-copyFromLocal', '-d'] + to_copy + [fuzz_table_location])
what was this change for?



-- 
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 May 2019 20:34:31 +0000
Gerrit-HasComments: Yes