You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Henry Robinson (Code Review)" <ge...@cloudera.org> on 2016/04/06 00:29:58 UTC

[Impala-CR](cdh5-trunk) IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

Henry Robinson has posted comments on this change.

Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
......................................................................


Patch Set 10:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 839: There is no directory structure in S3, so "inheriting" permissions is not
        :           // possible.
S3 has per-object permissions (in the form of ACLs see http://docs.aws.amazon.com/AmazonS3/latest/dev/S3_ACLs_UsingACLs.html). So what's to stop us mimicking inherited permissions? Even though there's no directory structure, we could insure that permissions flow down the partition hierarchy.

I'm happy to leave this as a TODO, but check my reasoning first.


Line 875: stringstream ss;
        :           ss << "Error(s) deleting partition directories. First error (of "
        :              << partition_create_ops.errors().size() << ") was: " << err.second;
While you are here, could you clean up he use of stringstream in favour of Substitute()?


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.cc
File be/src/util/hdfs-bulk-ops.cc:

Line 111: .c_str()
not really your change, but I think you can get rid of .c_str() in all the logging messages


Line 128:     string error_msg = connection_status.ok() ? GetStrErrMsg() :
put the line break here, and put the whole ternary statement on the next line if it fits


Line 156:   if (connection_cache != NULL) connection_cache_ = connection_cache;
You should do this even if connection_cache is NULL - the caller might be trying to explicitly say "don't use a connection cache".


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-bulk-ops.h
File be/src/util/hdfs-bulk-ops.h:

Line 94:   /// Constructs a new operation set.
comment is now redundant, can be removed.


Line 109:   /// If abort_on_error is true, execution will finish after the first error seen.
update comment to explain connection_cache


Line 121:     return connection_cache_;
one line


Line 130:   Promise<bool> promise_;
Can you replace this with a CountingBarrier? I think that will be more natural and will simplify some of the logic.


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.cc
File be/src/util/hdfs-util.cc:

Line 77: // TODO: 's3a' is the smallest scheme that our filesystem supports, so we compare the
       :   // first 6 characters. Change if we support a filesystem with a smaller scheme.
Comment is out-of-date


http://gerrit.cloudera.org:8080/#/c/2574/10/be/src/util/hdfs-util.h
File be/src/util/hdfs-util.h:

Line 45: IsDfsPath
I think this should be IsHdfsPath(...), because other filesystems technically implement the DFS abstraction.


-- 
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-HasComments: Yes