You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org> on 2017/09/08 05:41:38 UTC

[Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

Dimitris Tsirogiannis has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
......................................................................


Patch Set 1:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 288:         BuildTopicUpdates(catalog_objects.updated_objects, false);
> Is there a requirement/assumption that the deletions must come after the up
No, it doesn't matter.


Line 300: void CatalogServer::BuildTopicUpdates(const vector<TCatalogObject>& catalog_objects,
> IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAl
These two "do something" blocks are almost identical. I refactored the code a bit to make it a bit less flaky.


Line 333:             << catalog_object.catalog_version;
> indent
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

Line 154:   /// determine items that have been deleted, it saves the set of topic entry keys sent
> Update comment
Done


Line 164:       bool topic_deletions);
> What does this param do?
Updated the comment.


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 481:   item.key = host.ip;
> Use __set fn like above to be consistent
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 141:   if (delta.is_delta && delta.topic_entries.empty()) {
> single line
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1323:       TCatalogObject catalog_object;
> Please move this right before DeserializeThriftMsg() so we can see where it
Done


Line 1343:                      << item.key << " is "
> indentation
Done


Line 1367:         LOG(INFO) << "Deleted item: " << item.key << " version: " << catalog_object.catalog_version;
> That's a lot of logging, sure we need this at INFO level?
Sorry, that was for debugging.


Line 1398:             if (existing_object.catalog_version <= catalog_object.catalog_version) {
> Shouldn't this be < and not <=?
Done


PS1, Line 1406: batch_size_bytes
> Looks like we didn't account this earlier for deletions. This logic makes m
Done


Line 1527:       if (item.deleted) {
> Is it possible that within a single list of topic_entries there is both an 
No, it's not possible to have both an addition and a deletion in the same list of topic_entries.


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

Line 485:             item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value,
> Why not just 'item.value'? The value gets copied.
Good point. Done


Line 527:           deleted_key_size_bytes -= itr->first.size();
> += ?
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 92
> - I kind of like this separation of updates and deletions in different list
By switching to TTopicItem for topic deletions there is now significant overlap between operations performed on both added and deleted items. We could try to extract most of the common logic in separate functions and keep the existing logic in place, but for now it seems simple enough to have one loop over one list of items and separate the logic with an if statement when needed. If we had to add many of these if statements in multiple places, I'd agree that splitting the logic would have been better.


Line 97:   5: required bool is_delta;
> fix member numbering
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:   private Map<String, Long> removedCatalogObjectVersionsByKey_ = Maps.newHashMap();
> I might be wrong, but I think we can end up having the same object deleted 
Good catch, that was a bug indeed. Removed the second map and simplified the logic around handling the deleted objects.


Line 70:    * key 'key'.
> update comment
No longer applicable. Function is removed.


Line 90:   public synchronized void garbageCollect(long currentCatalogVersion) {
> Although these methods are synchronized, the underlying maps aren't thread 
That's not correct. The essence of having synchronized functions is exactly that, to ensure proper synchronization even though the underlying collections are not thread safe. Two threads calling garbageCollect and addRemoved cannot both proceed, one will grab the lock on the CatalogDeltaLog object and make progress while the other will simply block. In any case, this class is simplified and it only contains one collection.


Line 154:   private String toCatalogObjectKey(TCatalogObject catalogObject) {
> static?
Done


Line 163:         return "FUNCTION:" + catalogObject.getFn().getName() + "(" +
> the function name has db and name, let's print db.fnname similar to what we
Db is already included in the function name.


Line 173:       default: return null;
> IllegalStateException?
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 152:   // Log of deleted catalog objects.
> Yea, I'm trying to wrap my head around the lifecycle of delta log on a the 
Done


Line 152:   // Log of deleted catalog objects.
> What exactly does this contain? When is it garbage collected? Add comment h
Done


Line 349:   // versions >= 'fromVersion'. The catalog objects that satisfy that condition are added
> the version condition here is ">=" but the deltaLog_.retrieveObjects() uses
Done


Line 358:       deltaLog_.deleteRemovedObject(catalogDb);
> Is the deleteRemovedObject() API really needed? If there exists an object t
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1084:     resp.result.setVersion(dataSource.getCatalogVersion());
> Shouldn't we increment the catalog version for a deleted object? Otherwise 
That already happens in catalog_.removeDataSource(). We're not really consistent in the way we handle version increment and assignment (i.e. it happens in multiple places today) and this is something that we will eventually have to fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes