You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2019/03/18 16:09:57 UTC

[GitHub] [incubator-pinot] sunithabeeram commented on a change in pull request #3979: Track Indexed timestamp across consuming segments

sunithabeeram commented on a change in pull request #3979: Track Indexed timestamp across consuming segments
URL: https://github.com/apache/incubator-pinot/pull/3979#discussion_r266520116
 
 

 ##########
 File path: pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
 ##########
 @@ -204,20 +205,22 @@ public boolean index(GenericRow row) {
     // If metrics aggregation is enabled and if the dimension values were already seen, this will return existing docId,
     // else this will return a new docId.
     int docId = getOrCreateDocId(dictIdMap);
-
+    boolean canTakeMore = false;
     // docId == numDocs implies new docId.
     if (docId == numDocs) {
       // Add forward and inverted indices for new document.
       addForwardIndex(row, docId, dictIdMap);
       addInvertedIndex(docId, dictIdMap);
       // Update number of document indexed at last to make the latest record queryable
-      return _numDocsIndexed++ < _capacity;
+      canTakeMore = _numDocsIndexed++ < _capacity;
     } else {
-      Preconditions
-          .checkState(_aggregateMetrics, "Invalid document-id during indexing: " + docId + " expected: " + numDocs);
+      Preconditions.checkState(_aggregateMetrics, "Invalid document-id during indexing: " + docId + " expected: " + numDocs);
       // Update metrics for existing document.
-      return aggregateMetrics(row, docId);
+      canTakeMore = aggregateMetrics(row, docId);
     }
+    // update indexing time
+    _lastIndexedTimestamp = System.currentTimeMillis();
 
 Review comment:
   The user defined SLA will account for any expected delays in the incoming stream itself. Just going by the poll time can miss cases where we have issues with indexing (there have been a few in the past; for ex, issues with aggregatable-metrics for multi-value columns etc).
   
   The timestamp in the record can have a very coarse granularity, for ex, days - which isn't ideal for the granularity we would like to track freshness at. Also, see motivation notes about the lack of other usable timestamps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org