You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org> on 2017/03/15 17:11:39 UTC

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6406

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 653 insertions(+), 298 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46:     private ConcurrentHashMap<String, Short> storageUuidToDiskId =
> add trailing _ to member vars
?


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 161:      * Synthesizes the block metadata of a file represented by 'fileStatus' that resides
> Done
i meant: how does it "synthesize" it? e.g., by creating artificial blocks on some boundary.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 112:         ListMap<TNetworkAddress> hostIndex, FileDescStats stats) throws IOException {
numUnknown...?


Line 275:     }
provide more information (such as the fact that it's used in conjunction with an FbFileBlock...)


Line 283:       String[] ip_port = location.split(":");
"Constructs an FbFileBlock..."?


Line 291: 
numUnknown...?


Line 305:     // Use ~REPLICA_HOST_IDX_MASK to extract the cache info (stored in MSB).
doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), loc.gethosts()[i])?


Line 327: 
"in the underlying...starts": fine to leave out, that's implied by fb semantics.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:         // Find the partition that this file belongs (if any).
numUnknown...?

does Reference<long> also work?


Line 741:       HashMap<Path, List<HdfsPartition>> partsByPath) {
is this from a rebase?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
> Looks like debug printing. Remove? (below too)
Why?


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS2, Line 67: storageIdtoInt
> (Unrelated to your change). Looks obsolete.
Done


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 119:       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
> Preconditions.checkState(isBlockMdSupported(fs))?
Done


PS2, Line 124: } else {
             :         locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
             :       }
> Do we really need this? Per my understanding, we only fetch LocatedFileStat
If fileStatus is not an instance of LocatedFileStatus we don't have the block locations. E.g. during refresh.


Line 141:         ListMap<TNetworkAddress> hostIndex) {
> Preconditions.checkState(!isBlockMdSupported(fs))
We don't have the filesystem here.


PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx);
> - replicaHostIdxs is [ushort] as per fbs definition in which case idx shoul
In Java there is no unsigned short. Hence, ushort is stored as in int in flatbuffer java-generated code. I like the two masks as it makes the intention more explicit.


Line 463:   public static class FileDescStats {
> For supportability purposes, I'm wondering if we should track the remote_ne
That's a good point. I plan on augmenting this in a separate patch along with other useful stats.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 344: =
> nit: spacing
Done


Line 365:   private static String findCommonPrefix(String a, String b) {
> Use Strings.commonPrefix() from guava?
Good point. Didn't know this existed.


Line 1933:       Collections.sort(orderedFds, new Comparator<FileDescriptor>() {
> Can we just make FileDescriptor implement Comparator<FileDescriptor> and ad
How are we going to call decompressFileName that references a structure (file name prefixes) in HdfsTable if we do that?


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
> What is the motivation to track this? A quick look at the places it is used
To avoid initializing data structures that aren't used in the catalog. Tests also need to access this because of the different way we have for loading metadata.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
> Any benchmark numbers? Just curious. Also I think it is good to add them he
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
> Trailing white spaces at multiple places in this file.
Done


Line 45:     set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
> Can this be moved outside the loop like JAVA_FE_ARGS
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
> extraneous white spaces.
Done


PS1, Line 54: file_name_prefix_idx
> Didn't understand what this prefix means till I read HdfsTable.fileNamePref
Done


Line 70: root_type FbFileDesc;
> What is the significance of this? I'm reading about FlatBuffers for the fir
Yeah, it is mostly used if you read from JSON. Removed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83:         storageIdGenerator.put(host, new Short(diskId));
> Drive-by comment: using 'new' here is pessimising memory consumption. If yo
Good point thanks. Done


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS1, Line 318: REPLICA_HOST_IDX_MASK
> Looks like it used for UNMASKing, rename may be?
It's a bit mask, so I'd rather keep the name as such :)


Line 318:     private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
> Document these?
Done


PS1, Line 357: getHostIdx
> how about just making hostIdx_ a short rather than typecasting everytime?
Done


PS1, Line 360: REPLICA_HOST_CACHE_MASK
> Shouldn't this be short rather than int? May be it works fine this way, jus
For convenience. Java short is signed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315:         Preconditions.checkNotNull(fd);
> I don't think this FileDescriptor.create() can ever return null? If yes, wo
Done


Line 338:    * Computes the common prefix between 'fileName' and 'prevFileName' and returns a
> I like the idea but I'm wondering if this kind of optimization is a little 
That's not entirely true. Multiple filenames from a single write do have a common prefix.


Line 420:           prefixCompressFileName(currentFileName, prevFileName);
> Also I think it may not be optimal to construct the prefixes this way by co
In principal you're right, this is not the best way to compress strings. However, due to the way these file names are generated (random) this is not meant to be a generic file name compressor. Here we take advantage of the pattern that file names of a single write exhibit to avoid storing the common prefix multiple times.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet<TableType> SUPPORTED_TABLE_TYPES = EnumSet.of(
> May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by
Not sure how this ended up here. Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 36:   offset: long = 0 (id: 0);
Why a default value? Seems potentially dangerous.

Not for this change, but I'm thinking we don't even need to store the offset. If we know the block size, than we can derive the offset (assuming the list of file blocks is ordered by offset). Might be worth adding a TODO or recording that idea somewhere.


Line 40:   length: long = -1 (id: 1);
Seems redundant, why keep it?


Line 65:   compression: FbCompression (id: 3);
Will FlatBuffers add padding to align members? Ideally, we'd optimize for space and not access speed.


http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
Nice


Line 296:   10: optional list<string> file_name_prefixes
Is this required for the move to flat buffers? I think we should consider an HdfsPathSet abstraction that assigns path ids and internally compresses the underlying strings. There's a lot of manual lookups and stitching in the current code. I don't feel too strongly about whether we should do that now, or clean up the code later.


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFbCompression() {
toFlatBuffer()?


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 126:         locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
I think it would be better for now to keep all the code that fetches information from external systems in one place (HdfsTable). Splitting up the loading and delegating to several classes may make sense, but that probably requires significant surgery, and the current fetching code is very much centralized (we iterate over all files of in a table).

The loading code in this patch is more confusing to me than before. The meaning of some verbs like load/create is less clear.

If you agree with that direction, we may not need a FileDescriptor class at all, and can only rely on the FB to hold the data. It may still make sense to have a FileDescBuilder which you can use to construct a FbFileDesc.


Line 227:             loc.getNames().length);
Another case where we might be calling out to the NN.


Line 321:     private static int REPLICA_HOST_CACHE_MASK = 0x8000;
Less code and more readable to have one var with (1 << 15). The places that need to & the mask can bit-wise invert, i.e. (x & ~MASK).


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 189:   // List of file name prefixes.
Sorted list of file name prefixes?


Line 309:         Pair<Integer, String> compressedFileName =
How predictable is the compression behavior? Do we iterate over the files in lexicographical order for both HDFS and S3?
I'm worried about the case where an "invalidate metadata" suddenly leads to a higher memory requirement even if no files/partitions have changed.


Line 340:    * the suffix is equal to 'fileName'.
Should mention that this function expects the fileNames to be sorted.


Line 346:     String commonPrefix = Strings.commonPrefix(prevFileName, fileName);
I'm wondering if it's better to first check the last common prefix instead of the prefix between the current and prev file name. In the current version it seems like the list of prefixes could contain strings which are a common prefix of another one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 5:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 31:   LZ4
> zlib?
Done


Line 38:   offset: long (id: 0);
> why not specify a default, given that most will be 0?
Done


Line 51:   disk_ids: [ushort] (id: 3);
> is_cached?
This is embedded in MSB of replica_host_idxs.


http://gerrit.cloudera.org:8080/#/c/6406/5/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
> mention where to find the fb definition
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 78:           diskId = (short) (storageIdGenerator.get(host) + 1);
> at least assert that you don't get an overflow here
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFlatBuffer() {
> should we generally settle on toFb for functions like these? brevity is a v
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 115:       int[] blockOffsets = createBlockMetadata(fbb, blockLocations, hostIndex, stats);
> i couldn't tell from the name that this creates an array of fbs. call it so
Done


Line 126:     public static FileDescriptor createWithSynthesizedBlockMetadata(FileStatus fileStatus,
> feel free to generally abbreviate metadata as md, to make this very long na
Done


PS5, Line 141: blockOffsets
> would also be good to name this so it's immediately clear what it is (ie, e
Done


Line 161:      * Synthesizes the block metadata of a file represented by 'fileStatus' that resides
> you don't say what that block metadata contains.
Done


Line 182:         List<BlockReplica> replicas = Lists.newArrayList(
> same comment about intermediate objects as below
Done


Line 208:         List<BlockReplica> replicas = Lists.newArrayListWithExpectedSize(
> hm, it would be nice not to have to construct these intermediate objects. p
Done


Line 224:     public THdfsCompression getFileCompression() {
> does this need to return a thrift struct?
Done


Line 231:     public FileBlock getFileBlock(int idx) {
> why not just return the fbfileblock and save yourself the object constructi
Done


Line 329:      * Loads the metadata of a file block from its block location metadata 'location' and
> what do you mean by 'load'? 'construct an FbFileBlock with the provided fbb
Done


Line 346:         replicaIdx = replica.isCached() ?
> please describe this encoding in the .fb file.
Done


Line 360:       FbFileBlock.addOffset(fbb, location.getOffset());
> why not just pass in the offset/length directly and avoid constructing a Bl
Done


Line 402:     public boolean hasCachedReplica() { return hasCachedReplica_; }
> instead of adding member vars, why not make all functions be static and tak
Done


Line 449:   public static class FileDescStats {
> since this only keeps track of a single number, consider using a Long inste
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 134:   // if the table is stored in the catalog server.
> what does that mean? i thought all tables are stored in the catalog server.
The same class is used both in CatalogServiceCatalog (catalod) and ImpaladCatalog (impalad). However, when the table is stored in the catalog we don't need to populate data structure used for static partition pruning. I agree "stored in the catalog" is not the best description.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
> is this part of your changes?
Yes, see previous comment in HdfsTable.java.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
Looks like debug printing. Remove? (below too)


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS2, Line 67: storageIdtoInt
(Unrelated to your change). Looks obsolete.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 119:       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
Preconditions.checkState(isBlockMdSupported(fs))?


PS2, Line 124: } else {
             :         locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
             :       }
Do we really need this? Per my understanding, we only fetch LocatedFileStatus'es going forward as we don't want to do another RPC to getFileBlockLocations(). In another words, is there some caller that requires this?


Line 141:         ListMap<TNetworkAddress> hostIndex) {
Preconditions.checkState(!isBlockMdSupported(fs))


PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx);
- replicaHostIdxs is [ushort] as per fbs definition in which case idx should be a short? That REPLICA_HOST_CACHE_MASK could be made made short and avoid typecasting in L361. 

- Also that way we could use only REPLICA_HOST_CACHE_MASK and use ~REPLICA_HOST_CACHE_MASK for unmasking instead of a separate REPLICA_HOST_IDX_MASK. Feel free to ignore this if you think that affects readability.


Line 463:   public static class FileDescStats {
For supportability purposes, I'm wondering if we should track the remote_network_addresses and present a rolled up version of stats so that we can know which DNs to look at when we see -ve disk IDs.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 344: =
nit: spacing


Line 365:   private static String findCommonPrefix(String a, String b) {
Use Strings.commonPrefix() from guava?


Line 1933:       Collections.sort(orderedFds, new Comparator<FileDescriptor>() {
Can we just make FileDescriptor implement Comparator<FileDescriptor> and add a compare() method there. I think that'd be cleaner.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
What is the motivation to track this? A quick look at the places it is used, it seems like they are only called when it actually true (for example resetPartitions()). Are there callers when its false too? Like tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 4:

(11 comments)

Seems fine. Nothing major left to do here.

http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 40:   // Total length of the block
> You mean why not compute it on the fly by iterating on the file blocks?
It can be computed based on the offset, file length, and block size. No need to even iterate.

All but the last block have length == block size. You can determine last block based on its offset and compute the length with file length - offset.

We don't have to do this now, but seems simple enough to do later.


Line 65:   last_modification_time: long (id: 3);
> Hm, not sure I understand what the ask is here? Should I try to merge this 
I was mostly wondering if the field order matters. For example, could we save some space by moving the compression field last? Having it here in the middle, followed by a long seems like fb "might" add padding to align the last_modification_time member.


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 227:           FbCompression.name(fbFileDescriptor_.compression()));
> I don't think any of these statements is making a call to NN. Which one are
I though loc.getNames() might because it can throw an IOException. I looked at the HDFS code and I don't see any reason why that method should throw... Bizarre.


http://gerrit.cloudera.org:8080/#/c/6406/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 199:     private static int[] loadBlockMetadata(FlatBufferBuilder fbb,
createBlockMetadata()?

I'm not sure what the "load" was supposed to mean


Line 303:     private final boolean isCached_;
hasCachedReplica_


Line 372:     private static short[] loadDiskIds(BlockLocation location, FileDescStats stats) {
mapDiskIds() or createDiskIds()?

I'm not sure what "load" means here.


Line 380:         return Shorts.toArray(Collections.<Short>emptyList());
why not:

return new short[0];


Line 387:       List<Short> diskIDs = new ArrayList<Short>(storageIds.length);
Why not a short[]? The size is fixed and you can even use the 'i' index in the loop.


Line 403:     public boolean isCached() { return isCached_; }
hasCachedReplica()


http://gerrit.cloudera.org:8080/#/c/6406/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 110:     return storedInImpaladCatalogCache_ || RuntimeEnv.INSTANCE.isTestEnv();
Comment on the test env condition


Line 287:     // Indicate that this table is stored in an Impalad catalog cache.
no need for comment


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 648 insertions(+), 304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83:         storageIdGenerator.put(host, new Short(diskId));
Drive-by comment: using 'new' here is pessimising memory consumption. If you use Short.valueof() or just let Java automatically box it, it will use a built-in cache for values -128 to 127, e.g. see http://docs.oracle.com/javase/7/docs/api/java/lang/Short.html#valueOf(short)


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#9).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 560 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 8: Code-Review+2

Rebase and fix minor test. Keep Marcel's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 5:

(23 comments)

when staffan did the origin analysis, he found problems with both the total amount of memory needed as well as the rate of object creation. it would be good to try to reduce the latter in your current code even more, by accessing the fb structures directly instead of wrapping them with objects.

http://gerrit.cloudera.org:8080/#/c/6406/5/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 31:   LZ4
zlib?


Line 38:   offset: long (id: 0);
why not specify a default, given that most will be 0?


Line 51:   disk_ids: [ushort] (id: 3);
is_cached?


http://gerrit.cloudera.org:8080/#/c/6406/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
reference which one exactly


http://gerrit.cloudera.org:8080/#/c/6406/5/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
mention where to find the fb definition


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46:     private ConcurrentHashMap<String, Short> storageUuidToDiskId =
add trailing _ to member vars


Line 78:           diskId = (short) (storageIdGenerator.get(host) + 1);
at least assert that you don't get an overflow here


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFlatBuffer() {
should we generally settle on toFb for functions like these? brevity is a virtue...


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 115:       int[] blockOffsets = createBlockMetadata(fbb, blockLocations, hostIndex, stats);
i couldn't tell from the name that this creates an array of fbs. call it something like createFbFileBlocks or createFbFileBlockVector to indicate how this relates to the fb structures (and the fact that it outputs fbs)?


Line 126:     public static FileDescriptor createWithSynthesizedBlockMetadata(FileStatus fileStatus,
feel free to generally abbreviate metadata as md, to make this very long name a bit shorter.


PS5, Line 141: blockOffsets
would also be good to name this so it's immediately clear what it is (ie, embed the fb struct name).

you can also apply that to other variable names, ie, use a deterministic naming algorithm to make it immediately clear what the things are. i used that approach in my proof-of-concept test code, i think it made the code reasonably easy to follow.


Line 161:      * Synthesizes the block metadata of a file represented by 'fileStatus' that resides
you don't say what that block metadata contains.


Line 182:         List<BlockReplica> replicas = Lists.newArrayList(
same comment about intermediate objects as below


Line 208:         List<BlockReplica> replicas = Lists.newArrayListWithExpectedSize(
hm, it would be nice not to have to construct these intermediate objects. pass all the params that you need for construction into fileblock.createfbfileblock and then populate the fbfileblock directly?


Line 224:     public THdfsCompression getFileCompression() {
does this need to return a thrift struct?


Line 231:     public FileBlock getFileBlock(int idx) {
why not just return the fbfileblock and save yourself the object construction?


Line 329:      * Loads the metadata of a file block from its block location metadata 'location' and
what do you mean by 'load'? 'construct an FbFileBlock with the provided fbb'?


Line 346:         replicaIdx = replica.isCached() ?
please describe this encoding in the .fb file.


Line 360:       FbFileBlock.addOffset(fbb, location.getOffset());
why not just pass in the offset/length directly and avoid constructing a BlockLocation?


Line 402:     public boolean hasCachedReplica() { return hasCachedReplica_; }
instead of adding member vars, why not make all functions be static and take an FbFileBlock parameter. that way, you save yourself more object construction.


Line 449:   public static class FileDescStats {
since this only keeps track of a single number, consider using a Long instead, that's less indirect.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 134:   // if the table is stored in the catalog server.
what does that mean? i thought all tables are stored in the catalog server.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
is this part of your changes?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 40:   // Total length of the block
> It can be computed based on the offset, file length, and block size. No nee
Good point. Left a TODO.


Line 65:   last_modification_time: long (id: 3);
> I was mostly wondering if the field order matters. For example, could we sa
Interesting thought. Left a TODO to investigate later.


http://gerrit.cloudera.org:8080/#/c/6406/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 199:     private static int[] loadBlockMetadata(FlatBufferBuilder fbb,
> createBlockMetadata()?
Done


Line 303:     private final boolean isCached_;
> hasCachedReplica_
Done


Line 372:     private static short[] loadDiskIds(BlockLocation location, FileDescStats stats) {
> mapDiskIds() or createDiskIds()?
Done


Line 380:         return Shorts.toArray(Collections.<Short>emptyList());
> why not:
hahah. I wonder if I could add another unnecessary step :P


Line 387:       List<Short> diskIDs = new ArrayList<Short>(storageIds.length);
> Why not a short[]? The size is fixed and you can even use the 'i' index in 
Done


Line 403:     public boolean isCached() { return isCached_; }
> hasCachedReplica()
Done


http://gerrit.cloudera.org:8080/#/c/6406/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 110:     return storedInImpaladCatalogCache_ || RuntimeEnv.INSTANCE.isTestEnv();
> Comment on the test env condition
Done


Line 287:     // Indicate that this table is stored in an Impalad catalog cache.
> no need for comment
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/559/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Impala Public Jenkins, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#10).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
14 files changed, 572 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Reviewed-on: http://gerrit.cloudera.org:8080/6406
Reviewed-by: Dimitris Tsirogiannis <dt...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
14 files changed, 572 insertions(+), 323 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46:     // the storage ID of a particular disk is unique across all the nodes in the cluster.
> ?
Oops, sorry missed that one. Done


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 112:         ListMap<TNetworkAddress> hostIndex, Reference<Long> unknownDiskIds)
> numUnknown...?
Done


Line 275:    * File Block metadata
> provide more information (such as the fact that it's used in conjunction wi
Done


Line 283:      * Constructs the metadata of a file block from its block location metadata
> "Constructs an FbFileBlock..."?
Done


Line 291:         ListMap<TNetworkAddress> hostIndex, Reference<Long> unknownDiskIds)
> numUnknown...?
Done


Line 305:         boolean isReplicaCached = cachedHosts.contains(loc.getHosts()[i]);
> doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), l
Not in Guava, ArrayUtils is a class in Apache Commons.


Line 327:      * using 'fbb' and returns the offset in the underlying buffer where the encoded file
> "in the underlying...starts": fine to leave out, that's implied by fb seman
Agreed but maybe it's good to explicitly mention it for future readers that may not be familiar with FB semantics.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:       Reference<Long> unknownDiskIds = new Reference<Long>();
> numUnknown...?
Unfortunately not, generics don't work with primitive types. Miss C++ :)?


Line 741:    * Helper method to load the partition file metadata from scratch. This method is
> is this from a rebase?
Yes, this is from the performance improvements of REFRESH.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#5).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 564 insertions(+), 315 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#8).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 563 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 9: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/554/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 654 insertions(+), 301 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 9: Code-Review+2

Forgot to push some change.

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/553/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/554/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 3:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 36:   offset: long = 0 (id: 0);
> Why a default value? Seems potentially dangerous.
Done


Line 40:   length: long = -1 (id: 1);
> Seems redundant, why keep it?
You mean why not compute it on the fly by iterating on the file blocks?


Line 65:   compression: FbCompression (id: 3);
> Will FlatBuffers add padding to align members? Ideally, we'd optimize for s
Hm, not sure I understand what the ask is here? Should I try to merge this with some other field or something else?


http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
> Leave a todo to put this in a KRPC sidecar :) No need to pay serialization 
Done


Line 218:   1: required binary file_desc_data
> Nice
Done


Line 296:   10: optional list<string> file_name_prefixes
> Is this required for the move to flat buffers? I think we should consider a
No, this is definitely not needed now. Actually, I feel the current implementation is too flaky and not easy to explain. I'll remove it for now and leave a TODO. We can revisit the file name compression later.


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFbCompression() {
> toFlatBuffer()?
Done


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 126:         locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
> I think it would be better for now to keep all the code that fetches inform
Done


Line 227:             loc.getNames().length);
> Another case where we might be calling out to the NN.
I don't think any of these statements is making a call to NN. Which one are you talking about?


Line 321:     private static int REPLICA_HOST_CACHE_MASK = 0x8000;
> Less code and more readable to have one var with (1 << 15). The places that
Done


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 189:   // List of file name prefixes.
> Sorted list of file name prefixes?
This was removed.


Line 309:         Pair<Integer, String> compressedFileName =
> How predictable is the compression behavior? Do we iterate over the files i
Removed.


Line 340:    * the suffix is equal to 'fileName'.
> Should mention that this function expects the fileNames to be sorted.
Removed.


Line 346:     String commonPrefix = Strings.commonPrefix(prevFileName, fileName);
> I'm wondering if it's better to first check the last common prefix instead 
Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#6).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 556 insertions(+), 320 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#4).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 556 insertions(+), 313 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
Any benchmark numbers? Just curious. Also I think it is good to add them here.


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
Trailing white spaces at multiple places in this file.


Line 45:     set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
Can this be moved outside the loop like JAVA_FE_ARGS


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
extraneous white spaces.


PS1, Line 54: file_name_prefix_idx
Didn't understand what this prefix means till I read HdfsTable.fileNamePrefixes_. I think it should be pointed out here.


Line 70: root_type FbFileDesc;
What is the significance of this? I'm reading about FlatBuffers for the first time, so pardon my ignorance. From what I understand, this isn't required in strongly typed systems. Please correct me if I'm wrong.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 318:     private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
Document these?


PS1, Line 318: REPLICA_HOST_IDX_MASK
Looks like it used for UNMASKing, rename may be?


PS1, Line 357: getHostIdx
how about just making hostIdx_ a short rather than typecasting everytime?


PS1, Line 360: REPLICA_HOST_CACHE_MASK
Shouldn't this be short rather than int? May be it works fine this way, just want to be sure.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315:         Preconditions.checkNotNull(fd);
I don't think this FileDescriptor.create() can ever return null? If yes, worth removing Preconditions check here and elsewhere.


Line 338:    * Computes the common prefix between 'fileName' and 'prevFileName' and returns a
I like the idea but I'm wondering if this kind of optimization is a little premature. Especially given the filenames generated by Impala writes are typically random UUID style strings and would likely not qualify this optimization. Thoughts? Please correct me if I got this wrong.


Line 420:           prefixCompressFileName(currentFileName, prevFileName);
Also I think it may not be optimal to construct the prefixes this way by comparing the previous name with the current name. May be we should huffman style encoding that minimizes the no. of bits across the entire set of file names?


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet<TableType> SUPPORTED_TABLE_TYPES = EnumSet.of(
May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by making it public?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#7).

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 559 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
> Nice
Leave a todo to put this in a KRPC sidecar :) No need to pay serialization for the string itself!


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
> Why?
Never mind, it looked to me like debug printing. Please ignore.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS2, Line 124:   locations = locatedFileStatus.getBlockLocations();
             :       } else {
             :        
> If fileStatus is not an instance of LocatedFileStatus we don't have the blo
Oops thanks for clarifying.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 10: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No