You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/04/06 18:14:13 UTC
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

    [ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481362#comment-14481362 ] 

Jason Lowe commented on YARN-3448:
----------------------------------

Thanks for the patch, Jonathan.  Interesting approach, and this should drastically improve performance for retention processing.  Some comments on the patch so far:

I think the code would be easier to follow if we didn't abuse Map.Entry as a pair class to associate a WriteBatch to the corresponding DB.  Creating a custom utility class to associate these would make the code a lot more readable than always needing to deduce that getKey() is a database and getValue() is a WriteBatch.

The underlying database throws a runtime exception, and the existing leveldb store translates these to IOExceptions.  I think we want to do the same here.  For example, put has a try..finally block with no catch clauses yet the method says it does not throw exceptions like IOException.  Arguably it should throw IOException when the database has an error.

The original leveldb code had locking around entities but I don't see it here.  Since updating entities often involves a read/modify/write operation on the database, are we sure it's OK to remove that synchronization?

computeCheckMillis says it needs to be called synchronously, but it looks like it can be called without a lock via a number of routes, e.g.:
put -> putIndex -> computeCurrentCheckMillis -> computeCheckMillis
put -> putEntities -> computeCurrentCheckMillis -> computeCheckMillis

These should probably be debug statements, otherwise I think they could be quite spammy in the server log.  Also the latter one will always be followed by the former because of the loop and may not be that useful in practice, even at the debug level.
{code}
+        LOG.info("Trying the  db" + db);
...
+        LOG.info("Trying the previous db" + db);
{code}

This will NPE on entityUpdate if db is null, and the code explicity checks for that possibility:
{code}
+      Map.Entry<DB, WriteBatch> entityUpdate = entityUpdates.get(roundedStartTime);
+      if (entityUpdate == null) {
+        DB db = entitydb.getDBForStartTime(startAndInsertTime.startTime);
+        if (db != null) {
+          WriteBatch writeBatch = db.createWriteBatch();
+          entityUpdate = new AbstractMap.SimpleImmutableEntry<DB, WriteBatch>(db, writeBatch);
+          entityUpdates.put(roundedStartTime, entityUpdate);
+        };
+      }
+      WriteBatch writeBatch = entityUpdate.getValue();
{code}

In the following code we lookup relatedEntityUpdate but then after checking if it's null never use it again.  I think we're supposed to be setting up relatedEntityUpdate in the block if it's null rather than re-assigning entityUpdate.  Then after the null check we should be using relatedEntityUpdate rather than entityUpdate to get the proper write batch.
{code}
+            Map.Entry<DB, WriteBatch> relatedEntityUpdate = entityUpdates.get(relatedRoundedStartTime);
+            if (relatedEntityUpdate == null) {
+              DB db = entitydb.getDBForStartTime(relatedStartTimeLong);
+              if (db != null) {
+                WriteBatch relatedWriteBatch = db.createWriteBatch();
+                entityUpdate = new AbstractMap.SimpleImmutableEntry<DB, WriteBatch>(
+                    db, relatedWriteBatch);
+                entityUpdates.put(relatedRoundedStartTime, entityUpdate);
+              }
+              ;
+            }
+            WriteBatch relatedWriteBatch = entityUpdate.getValue();
{code}

This code is commented out.  Should have been deleted or is there something left to do here with respect to related entitites?
{code}
+    /*
+    for (EntityIdentifier relatedEntity : relatedEntitiesWithoutStartTimes) {
+      try {
+        StartAndInsertTime relatedEntityStartAndInsertTime =
+            getAndSetStartTime(relatedEntity.getId(), relatedEntity.getType(),
+            readReverseOrderedLong(revStartTime, 0), null);
+        if (relatedEntityStartAndInsertTime == null) {
+          throw new IOException("Error setting start time for related entity");
+        }
+        byte[] relatedEntityStartTime = writeReverseOrderedLong(
+            relatedEntityStartAndInsertTime.startTime);
+          // This is the new entity, the domain should be the same
+        byte[] key = createDomainIdKey(relatedEntity.getId(),
+            relatedEntity.getType(), relatedEntityStartTime);
+        writeBatch.put(key, entity.getDomainId().getBytes());
+        ++putCount;
+        writeBatch.put(createRelatedEntityKey(relatedEntity.getId(),
+            relatedEntity.getType(), relatedEntityStartTime,
+            entity.getEntityId(), entity.getEntityType()), EMPTY_BYTES);
+        ++putCount;
+        writeBatch.put(createEntityMarkerKey(relatedEntity.getId(),
+            relatedEntity.getType(), relatedEntityStartTime),
+            writeReverseOrderedLong(relatedEntityStartAndInsertTime
+                .insertTime));
+        ++putCount;
+      } catch (IOException e) {
+        LOG.error("Error putting related entity " + relatedEntity.getId() +
+            " of type " + relatedEntity.getType() + " for entity " +
+            entity.getEntityId() + " of type " + entity.getEntityType(), e);
+        TimelinePutError error = new TimelinePutError();
+        error.setEntityId(entity.getEntityId());
+        error.setEntityType(entity.getEntityType());
+        error.setErrorCode(TimelinePutError.IO_EXCEPTION);
+        response.addError(error);
+      }
+    }
+    */
{code}

Similar NPE potential on indexUpdate if db is null:
{code}
+      Map.Entry<DB, WriteBatch> indexUpdate = indexUpdates.get(roundedStartTime);
+      if (indexUpdate == null) {
+        DB db = indexdb.getDBForStartTime(startAndInsertTime.startTime);
+        if (db != null) {
+          WriteBatch writeBatch = db.createWriteBatch();
+          indexUpdate = new AbstractMap.SimpleImmutableEntry<DB, WriteBatch>(db, writeBatch);
+          indexUpdates.put(roundedStartTime, indexUpdate);
+        };
+      }
+      WriteBatch writeBatch = indexUpdate.getValue();
{code}

getAndSetStartTime and checkStartTimeInDb comments refer to obtaining a lock on the entity but no entity-level locking appears to be used.

Nit: put and putWithNoDomainId should be refactored to call a parameterized form of common code since they are so similar.  Would also help fix the inconsistency where one debug logs and the other info logs.

Nit: there are some extraneous semicolons in the patch, either after braces or on lines by themselves.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> ------------------------------------------------------
>
>                 Key: YARN-3448
>                 URL: https://issues.apache.org/jira/browse/YARN-3448
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: YARN-3448.1.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is spent deleting old entities record at a time. An exclusive write lock is held during the entire deletion phase which in practice can be hours. If we are to relax some of the consistency constraints, other performance enhancing techniques can be employed to maximize the throughput and minimize locking time.
> Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. This allows each database to maximize the read cache effectiveness based on the unique usage patterns of each database. With 5 separate databases each lookup is much faster. This can also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. We replace DB record removal with file system removal if we create a rolling set of databases that age out and can be efficiently removed. To do this we must place a constraint to always place an entity's events into it's correct rolling db instance based on start time. This allows us to stitching the data back together while reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing some records that we not flushed in the operating system during a crash, we can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than random writes. Spend some small effort arranging the writes in such a way that will trend towards sequential write performance over random write performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)