You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Henry Robinson (Code Review)" <ge...@cloudera.org> on 2016/04/06 00:29:58 UTC
[Impala-CR](cdh5-trunk) IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
Henry Robinson has posted comments on this change.
Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
......................................................................
Patch Set 10:
(11 comments)
http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:
Line 839: There is no directory structure in S3, so "inheriting" permissions is not
: // possible.
S3 has per-object permissions (in the form of ACLs see http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_ACLs_UsingACLs.html). So what's to stop us mimicking inherited permissions? Even though there's no directory structure, we could insure that permissions flow down the partition hierarchy.
I'm happy to leave this as a TODO, but check my reasoning first.
Line 875: stringstream ss;
: ss << "Error(s) deleting partition directories. First error (of "
: << partition_create_ops.errors().size() << ") was: " << err.second;
While you are here, could you clean up he use of stringstream in favour of Substitute()?
http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.cc
File be/src/util/hdfs-bulk-ops.cc:
Line 111: .c_str()
not really your change, but I think you can get rid of .c_str() in all the logging messages
Line 128: string error_msg = connection_status.ok() ? GetStrErrMsg() :
put the line break here, and put the whole ternary statement on the next line if it fits
Line 156: if (connection_cache != NULL) connection_cache_ = connection_cache;
You should do this even if connection_cache is NULL - the caller might be trying to explicitly say "don't use a connection cache".
http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.h
File be/src/util/hdfs-bulk-ops.h:
Line 94: /// Constructs a new operation set.
comment is now redundant, can be removed.
Line 109: /// If abort_on_error is true, execution will finish after the first error seen.
update comment to explain connection_cache
Line 121: return connection_cache_;
one line
Line 130: Promise<bool> promise_;
Can you replace this with a CountingBarrier? I think that will be more natural and will simplify some of the logic.
http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:
Line 77: // TODO: 's3a' is the smallest scheme that our filesystem supports, so we compare the
: // first 6 characters. Change if we support a filesystem with a smaller scheme.
Comment is out-of-date
http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:
Line 45: IsDfsPath
I think this should be IsHdfsPath(...), because other filesystems technically implement the DFS abstraction.
--
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-HasComments: Yes