You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jugomezv (via GitHub)" <gi...@apache.org> on 2023/03/13 23:49:57 UTC

[GitHub] [pinot] jugomezv opened a new pull request, #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

jugomezv opened a new pull request, #10418:
URL: https://github.com/apache/pinot/pull/10418

   …ges with no events processed.
   
   Instructions:
   1. The PR has to be tagged with at least one of the following labels (*):
      1. `feature`
      2. `bugfix`
      3. `performance`
      4. `ui`
      5. `backward-incompat`
      6. `release-notes` (**)
   2. Remove these instructions before publishing the PR.
    
   (*) Other labels to consider:
   - `testing`
   - `dependencies`
   - `docker`
   - `kubernetes`
   - `observability`
   - `security`
   - `code-style`
   - `extension-point`
   - `refactor`
   - `cleanup`
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1470881958

   > When all events are filtered and no actual events are processed, set consumption delay to zero as we are caught up processing actual events.
   
   All events filtered in a given batch doesn't imply that we are caught up. In this scenario, the consumer returned messages , however, it was all filtered for whatever reason (is it tombstones today??). This scenario should be treated differently than when fetch messages doesn't return any messages from the consumer.  Have I misunderstood the hidden assumptions in the current realtime segment data manager's implementation ?
   
   > This does not affect metric reports in active tables that have incoming events, only inactive tables that have long sequences of filtered events.
   
   What do you mean by "inactive tables"? Are these tables that are not consuming or tables that are consuming from an empty source? Or something else?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1492733456

   failing integration test passing on my env, forced committed to rerun tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1146815948


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,12 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * We need this to determine ingestion delay when we receive only null messages (Tombstone messages)
+   * @return last metadata for a null message received by the string
+   */
+  default public StreamMessageMetadata getLastTombstoneMetadata() {

Review Comment:
   Agreed with removal of tobstone
   
   s/filtered/unfiltered/
   
   I agree with last comment, I think we should retain the last metadata for the batch, that makes it more generic and less specific. I have done the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jadami10 commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1478569188

   My thinking was more towards my first comment. Both what is on master and this PR are not fundamentally right. But this version at least gives you true positives; if it says lag > 0, it is definitely > 0. I would argue that makes it more useful even in this intermediate state.
   
   That said, my feelings aren't that strong, and I'm mostly commenting as an interested observer. If you're both comfortable with/prefer a completely separate patch, that's totally fine with me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1478802638

   @jugomezv are you able to reproduce this scenario in kafka? It would be worthwhile to either add an integration test or some manual way of testing this specific scenario. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1472730552

   @jugomezv thanks for re-evaluating. I was looking at the code and didn't quite understand why you can't get the timestamp for filtered batch. It should be possible without changing the API. Let's syncup over a call. I will DM you on slack. tks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1476831216

   > Internally, we have multiple kafka clusters per topic for better availability. We use our own stream ingestion plugin to make it look like 1 topic to Pinot. But it means across most tables/topics, some partitions get very few events.
   
   That's definitely not a pattern one would expect :) 
   
   @jugomezv and I discussed this on friday and I believe he is going to come up with some proposal on API changes that can provide metadata on the filtered messages. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1476823617

   >  since this also affects us for every table. 
   
   @jadami10 why does this affect every table that you have? iiuc, this only affects when all the fetched data gets filtered ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1478744324

   > My thinking was more towards my first comment. Both what is on master and this PR are not fundamentally right. But this version at least gives you true positives; if it says lag > 0, it is definitely > 0. I would argue that makes it more useful even in this intermediate state.
   > 
   > That said, my feelings aren't that strong, and I'm mostly commenting as an interested observer. If you're both comfortable with/prefer a completely separate patch, that's totally fine with me.
   
   This was exactly my initial view but I @navina and @Jackie-Jiang convinced me there may be a better solution along the lines of what I just pushed. The other bad side effect on my original proposal was that if we get a sequence of batch of messages that are:
   
   [ValidEventBatch][NullMsgBatch][ValidEventBatch].... etc our metric will jump between a real value and zero creating very bad experience for users, if you can try this new patch that would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1154923326


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,14 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * This is useful while determining ingestion delay for a message batch. Retaining metadata for last message in
+   * a batch can enable us to estimate the ingestion delay for the batch.
+   *
+   * @return null by default.
+   */
+  default public StreamMessageMetadata getLastMessageMetadata() {

Review Comment:
   Addressed in last commit 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1469126491

   Hi @Jackie-Jiang please take a look when you can, this should be simple to review. TIA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1472666218

   I slept over this and I think there are two possible paths we could take:
   1.-Modify the stream API so that we can extract at least one time stamp of a batch of all filtered messages: this will enable the tracker to publish a precise metric. If this is too complex we can at least introduce 2:
   2.-Emit a info log after a fixed number of contiguous all-filtered batches: although this is less desirable at least will enable us to check the logs and verify if a ever increasing ingestion delay metric is due to actual problem or contiguous batches of all-filtered messages.
   
   What do you guys think? @navina @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi merged pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "sajjad-moradi (via GitHub)" <gi...@apache.org>.
sajjad-moradi merged PR #10418:
URL: https://github.com/apache/pinot/pull/10418


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1471328453

   > I don't think this is correct. Imagine we got some real messages, followed by some filtered messages, followed by some real messages. We should not set the delay to 0 in case all messages are filtered because it doesn't indicate we have already caught up. Am I missing something here?
   
   @Jackie-Jiang I see your point of view, but the situation I am trying to address here is constant repeated invocations of the loop where we filtered all messages, the current outcome is not correct correct either: an event comes in and we record that delay, then event batches are all filtered: the current outcome is time continues to ramp up to hours and hours and we are actually consuming. This is confusing and not correct. Do you have a better suggestion?
   
   We have seen this in some tables that are not actively being used and I think someone else reported earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1478437298

   > A new API work. Does it make sense to still merge this as is in the meantime? It's still a fairly new metric.
   
   I recommend waiting on the API based change as setting ingestion delay to 0 when all records are filtered is not correct. The delay should still consider the timestamp on the filtered records. 
   If the fix was not performant, I wouldn't mind merging and fixing forward. But this pertains to correctness of the solution. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1478451116

   Here is one of the logs I got from a server where the metric was ramping up:
   <img width="822" alt="Screenshot 2023-03-21 at 12 14 02 PM" src="https://user-images.githubusercontent.com/109560870/226716819-2bfe1893-a2dd-4da3-a775-40cd309b0c13.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1512229539

   > My thinking was more towards my first comment. Both what is on master and this PR are not fundamentally right. But this version at least gives you true positives; if it says lag > 0, it is definitely > 0. I would argue that makes it more useful even in this intermediate state.
   > 
   > That said, my feelings aren't that strong, and I'm mostly commenting as an interested observer. If you're both comfortable with/prefer a completely separate patch, that's totally fine with me.
   
   Now that this change is merged, do you still observe the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1154950496


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -618,14 +618,18 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi
     updateCurrentDocumentCountMetrics();
     if (messagesAndOffsets.getUnfilteredMessageCount() > 0) {
       _hasMessagesFetched = true;
+      if (messageCount == 0) {
+        // If we did not get any events but got some unfiltered messages, we attempt to estimate the ingestion

Review Comment:
   Since the variable is used in many other places I will keep using it, may be change it in a future PR where we replace all uses for method invocation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1467164214

   ## [Codecov](https://codecov.io/gh/apache/pinot/pull/10418?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10418](https://codecov.io/gh/apache/pinot/pull/10418?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (09d6913) into [master](https://codecov.io/gh/apache/pinot/commit/f8d432013bc5672fd04088106bf2efd69a0d48a9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f8d4320) will **decrease** coverage by `49.39%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10418       +/-   ##
   =============================================
   - Coverage     63.24%   13.85%   -49.39%     
   + Complexity     5069      259     -4810     
   =============================================
     Files          2030     2001       -29     
     Lines        110629   108931     -1698     
     Branches      16842    16647      -195     
   =============================================
   - Hits          69965    15092    -54873     
   - Misses        35538    92619    +57081     
   + Partials       5126     1220     -3906     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `13.85% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10418?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...manager/realtime/LLRealtimeSegmentDataManager.java](https://codecov.io/gh/apache/pinot/pull/10418?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvTExSZWFsdGltZVNlZ21lbnREYXRhTWFuYWdlci5qYXZh) | `0.00% <0.00%> (-70.63%)` | :arrow_down: |
   
   ... and [1830 files with indirect coverage changes](https://codecov.io/gh/apache/pinot/pull/10418/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1146899073


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,14 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * This is useful while determining ingestion delay for a message batch. Retaining metadata for last message in
+   * a batch can enable us to estimate the ingestion delay for the batch.

Review Comment:
   Can you also add an example on how the batch may look like?
   Also mention that it is possible for the batch to be empty (all data is filtered), in which case the row metadata may still be non-null to facilitate ingestion delay computation.



##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,14 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * This is useful while determining ingestion delay for a message batch. Retaining metadata for last message in
+   * a batch can enable us to estimate the ingestion delay for the batch.
+   *
+   * @return null by default.
+   */
+  default public StreamMessageMetadata getLastMessageMetadata() {

Review Comment:
   nit: Add `@Nullable` annotation to this method 
   
   Can we do better than returning `null`? maybe, by default, check the message at the last index and if it exists, return its metadata. if there are no messages, return null. Do you think this is better?



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/IngestionDelayTracker.java:
##########
@@ -214,12 +214,20 @@ public void updateIngestionDelay(long ingestionTimeMs, long firstStreamIngestion
       // Do not update the ingestion delay metrics during server startup period
       return;
     }
+    if ((ingestionTimeMs < 0) && (firstStreamIngestionTimeMs < 0)) {

Review Comment:
   should this conditional be an `||` instead of `&&` ?



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -618,14 +618,18 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi
     updateCurrentDocumentCountMetrics();
     if (messagesAndOffsets.getUnfilteredMessageCount() > 0) {
       _hasMessagesFetched = true;
+      if (messageCount == 0) {
+        // If we did not get any events but got some unfiltered messages, we attempt to estimate the ingestion

Review Comment:
   can we rephrase this comment as it is super confusing:
   `If we received events from stream but all were filtered, we attempt to ...` ?
   
   better yet,  use `batch.getMessageCount() == 0` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1491242926

   I tested this change in one of our perf test servers and observed no regression and using some tracing I observed that the special case we are trying to address is caught:
   
   2023/03/30 22:35:02.106 INFO [LLRealtimeSegmentDataManager_ReviewThrottlerMessageStateChangeEvent__1__225__20230330T1242Z] [ReviewThrottlerMessageStateChangeEvent__1__225__20230330T1242Z] [pinot-server] [] =>updateIngestionDelay() indexedMessages = 0, noTransformedRows = true
   2023/03/30 22:35:02.106 INFO [LLRealtimeSegmentDataManager_ReviewThrottlerMessageStateChangeEvent__1__225__20230330T1242Z] [ReviewThrottlerMessageStateChangeEvent__1__225__20230330T1242Z] [pinot-server] [] =>updateIngestionDelay(md) ingestion TS = 1680215702091, end to end TS = \
   1680215668086
   
   In this case we indexed zero message and we had no transformed rows but we did have metadata for a null message and we recorded a timestamp.
   
   I also verified that we do record timestamps when actual messages are indexed (no regressions):
   
   2023/03/30 22:40:37.336 INFO [LLRealtimeSegmentDataManager_CountableImpressionDiscounting__0__4664__20230329T2250Z] [CountableImpressionDiscounting__0__4664__20230329T2250Z] [pinot-server] [] =>updateIngestionDelay() indexedMessages = 9, noTransformedRows = false
   2023/03/30 22:40:37.336 INFO [LLRealtimeSegmentDataManager_CountableImpressionDiscounting__0__4664__20230329T2250Z] [CountableImpressionDiscounting__0__4664__20230329T2250Z] [pinot-server] [] =>updateIngestionDelay(md) ingestion TS = 1680216037332, end to end TS = 1680215973760


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] swaminathanmanish commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "swaminathanmanish (via GitHub)" <gi...@apache.org>.
swaminathanmanish commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1470610512

   @jugomezv - Just curious, whats the amount of delay this causes? 
   "With this we avoid a ramping
      delay caused by one event followed by a long period of filtered messages"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1149820316


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,14 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * This is useful while determining ingestion delay for a message batch. Retaining metadata for last message in
+   * a batch can enable us to estimate the ingestion delay for the batch.
+   *
+   * @return null by default.
+   */
+  default public StreamMessageMetadata getLastMessageMetadata() {

Review Comment:
   Ack to nullable.
   
   I think the second suggestion in this comment may lead to trouble: the last index may not be the message that provided the last metadata: it depends on errors etc and the external code already did the filtering for us so I don't want to have to redo that logic here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1151423143


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -618,14 +618,18 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi
     updateCurrentDocumentCountMetrics();
     if (messagesAndOffsets.getUnfilteredMessageCount() > 0) {
       _hasMessagesFetched = true;
+      if (messageCount == 0) {
+        // If we did not get any events but got some unfiltered messages, we attempt to estimate the ingestion

Review Comment:
   I don't think the ovehead is a lot and it definitely trumps the readability of the code. There are 3-4 types of counts that we track here in a very C style code. Hence, the suggetion 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1478445090

   > A new API work. Does it make sense to still merge this as is in the meantime? It's still a fairly new metric.
   
   @jadami10 As discussed earlier this patch will not be displaying correct times either: so the proposal is to modify batch interface to provide timestamp even for filtered messages, that way we will publish correct metrics even in the face of all filtered events. does that make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1471329754

   > @jugomezv - Just curious, whats the amount of delay this causes? "With this we avoid a ramping delay caused by one event followed by a long period of filtered messages"
   
   @swaminathanmanish in the cases reported in the original commit and some cases we have internally of tables that are not actively consuming the delay reported increases without bounds. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1471336951

   > > When all events are filtered and no actual events are processed, set consumption delay to zero as we are caught up processing actual events.
   > 
   > All events filtered in a given batch doesn't imply that we are caught up. In this scenario, the consumer returned messages , however, it was all filtered for whatever reason (is it tombstones today??). This scenario should be treated differently than when fetch messages doesn't return any messages from the consumer. Have I misunderstood the hidden assumptions in the current realtime segment data manager's implementation ?
   > 
   > > This does not affect metric reports in active tables that have incoming events, only inactive tables that have long sequences of filtered events.
   > 
   > What do you mean by "inactive tables"? Are these tables that are not consuming or tables that are consuming from an empty source? Or something else?
   
   @navina inactive tables I refer to tables that do not have events that are not filtered. The problem this is trying to tackle is the following sequence of events:
   
   EventSequnce:[Good Event-NotFiltered][allfilteredbatch][allfilteredbatch].........[allfilteredbatch]
   Delay reprted:  DelayForGoodEvent=t,t+delta, t+2delta.........................................t+n*delta
   
   If we could measure the delay on filtered events we will be OK but sadly those events are filtered, so we cant compute delay on them. 
   
   The ever increasing time is confusing as it lead us to believe that we are not consuming but in fact we are, we are just filtering messages.
   
   Hopefully, you follow the situation else I can set some time to zoom, I am open to any other suggestions that you and @Jackie-Jiang may have but current behavior does not seem right as it does not reflect the delay incurred by filtered events either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1470584218

   I don't think this is correct. Imagine we got some real messages, followed by some filtered messages, followed by some real messages. We should not set the delay to 0 in case all messages are filtered because it doesn't indicate we have already caught up. Am I missing something here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1144142308


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,12 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * We need this to determine ingestion delay when we receive only null messages (Tombstone messages)
+   * @return last metadata for a null message received by the string
+   */
+  default public StreamMessageMetadata getLastTombstoneMetadata() {

Review Comment:
   Better not to use the term "tombstone" as this interface applies to all stream plugins. 
   
   Is this the metadata for the last filtered message or unfiltered message? 
   
   I feel that adding an API that is so specific to a single scenario is not very useful. Can we change the semantics of the `MessageBatch` to say that the message batch may have filtered the data for all records in the batch but keeps the metadata around?   



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -1574,12 +1586,24 @@ private void createPartitionMetadataProvider(String reason) {
   private void updateIngestionDelay(int indexedMessageCount) {
     if ((indexedMessageCount > 0) && (_lastRowMetadata != null)) {
       // Record Ingestion delay for this partition
-      _realtimeTableDataManager.updateIngestionDelay(_lastRowMetadata.getRecordIngestionTimeMs(),
-          _lastRowMetadata.getFirstStreamRecordIngestionTimeMs(),
-          _partitionGroupId);
+      updateIngestionDelay(_lastRowMetadata);
     }
   }
 
+  private void updateIngestionDelay(RowMetadata metadata) {
+    _realtimeTableDataManager.updateIngestionDelay(metadata.getRecordIngestionTimeMs(),
+        metadata.getFirstStreamRecordIngestionTimeMs(),

Review Comment:
   Should we update the ingestion delay metric when any of `getRecordIngestionTimeMs` or `getFirstStreamRecordIngestionTimeMs` is not valid ? Looks like the default value for this is `Long.MIN_VALUE` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jadami10 commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1512241405

   > Now that this change is merged, do you still observe the issue?
   
   due to some internal dependencies, we're a little block getting this change into our custom stream ingestion plugin. So we still haven't implemented `getLastMessageMetadata`. But I'll let definitely you know when we do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1149823454


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,14 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * This is useful while determining ingestion delay for a message batch. Retaining metadata for last message in
+   * a batch can enable us to estimate the ingestion delay for the batch.

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] PrachiKhobragade commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "PrachiKhobragade (via GitHub)" <gi...@apache.org>.
PrachiKhobragade commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1150907914


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -527,9 +527,12 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi
     int indexedMessageCount = 0;
     int streamMessageCount = 0;
     boolean canTakeMore = true;
+    boolean noTransformedRows = true;

Review Comment:
   rename this to hasTransformedRows = false;



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] PrachiKhobragade commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "PrachiKhobragade (via GitHub)" <gi...@apache.org>.
PrachiKhobragade commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1144111099


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,12 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * We need this to determine ingestion delay when we receive only null messages (Tombstone messages)
+   * @return last metadata for a null message received by the string

Review Comment:
   What is the string here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jadami10 commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1476960243

   A new API work. Does it make sense to still merge this as is in the meantime? It's still a fairly new metric.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1149851292


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -618,14 +618,18 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi
     updateCurrentDocumentCountMetrics();
     if (messagesAndOffsets.getUnfilteredMessageCount() > 0) {
       _hasMessagesFetched = true;
+      if (messageCount == 0) {
+        // If we did not get any events but got some unfiltered messages, we attempt to estimate the ingestion

Review Comment:
   yes, there was some wording issue but now I have updated it correctly, as for the last suggestion they are it is the same right, not sure what advantage would be to call the method again here if we have the outcome in messageCount other than the overhead of the call?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1151420757


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,14 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * This is useful while determining ingestion delay for a message batch. Retaining metadata for last message in
+   * a batch can enable us to estimate the ingestion delay for the batch.
+   *
+   * @return null by default.
+   */
+  default public StreamMessageMetadata getLastMessageMetadata() {

Review Comment:
   >the last index may not be the message that provided the last metadata: it depends on errors etc and the external code already did the filtering for us so I don't want to have to redo that logic here. 
   
   The fact that last index may not be the message that provides the last metadata is not clear to me. 
   
   Say the batch contains consumed = "[m1, m2, m3, m4]" and after filtering = "[m1 , m2, <filtered>, m4]". 
   Is such a batch this possible?  If yes, what should be the return value of `getLastMessageMetadata` ? m3 ?
   
   The javadoc says "Retaining metadata for last message in a batch can enable us to estimate the ingestion delay for the batch." . So, it is m4? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1487986733

   @jugomezv Thanks for the detailed write-up in the PR description on the various scenarios. Very helpful!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1154955282


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -527,9 +527,12 @@ private boolean processStreamEvents(MessageBatch messagesAndOffsets, long idlePi
     int indexedMessageCount = 0;
     int streamMessageCount = 0;
     boolean canTakeMore = true;
+    boolean noTransformedRows = true;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jadami10 commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1476238610

   I want to +1 this solution as is since this also affects us for every table. I agree it's not functionally correct, but it's functionally better than what we have today. A buggy, ever increasing metric is impossible to reason about; you don't know if you're really lagging or there's just no data. But if the metric just dips to 0 transiently, we can abstract that away in a metrics system by either 1) taking a "max" over the time period or 2) just filtering out the 0s.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jadami10 commented on pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jadami10 (via GitHub)" <gi...@apache.org>.
jadami10 commented on PR #10418:
URL: https://github.com/apache/pinot/pull/10418#issuecomment-1476827706

   > @jadami10 why does this affect every table that you have? iiuc, this only affects when all the fetched data gets filtered ?
   
   Internally, we have multiple kafka clusters per topic for better availability. We use our own stream ingestion plugin to make it look like 1 topic to Pinot. But it means across most tables/topics, some partitions get very few events. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1146811126


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java:
##########
@@ -1574,12 +1586,24 @@ private void createPartitionMetadataProvider(String reason) {
   private void updateIngestionDelay(int indexedMessageCount) {
     if ((indexedMessageCount > 0) && (_lastRowMetadata != null)) {
       // Record Ingestion delay for this partition
-      _realtimeTableDataManager.updateIngestionDelay(_lastRowMetadata.getRecordIngestionTimeMs(),
-          _lastRowMetadata.getFirstStreamRecordIngestionTimeMs(),
-          _partitionGroupId);
+      updateIngestionDelay(_lastRowMetadata);
     }
   }
 
+  private void updateIngestionDelay(RowMetadata metadata) {
+    _realtimeTableDataManager.updateIngestionDelay(metadata.getRecordIngestionTimeMs(),
+        metadata.getFirstStreamRecordIngestionTimeMs(),

Review Comment:
   This is a good point, but the checking belongs in the core method in the IngestionDelayTracker class. I have modified it in such a way we don't publish the metric that comes with negative values in the ingestion tracker so that no matter where we invoke update ingestion delay it will do the right thing. 
   
   Now we support proper publishing of metrics if only ingestion delay is supported, or only end to end is supported or both are unsupported, or both are supported, or both are unsupported



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1146818129


##########
pinot-spi/src/main/java/org/apache/pinot/spi/stream/MessageBatch.java:
##########
@@ -116,4 +116,12 @@ default StreamPartitionMsgOffset getOffsetOfNextBatch() {
   default boolean isEndOfPartitionGroup() {
     return false;
   }
+
+  /**
+   * We need this to determine ingestion delay when we receive only null messages (Tombstone messages)
+   * @return last metadata for a null message received by the string

Review Comment:
   Yeap! this was expected to be stream :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1149826701


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/IngestionDelayTracker.java:
##########
@@ -214,12 +214,20 @@ public void updateIngestionDelay(long ingestionTimeMs, long firstStreamIngestion
       // Do not update the ingestion delay metrics during server startup period
       return;
     }
+    if ((ingestionTimeMs < 0) && (firstStreamIngestionTimeMs < 0)) {

Review Comment:
   && is the intention:
   
   Case 1 Stream does not support ingestion delay nor end to end delay: we bail out early and do not record anything.
   
   If either ingestion delay or end to end ingestion delay we do record the times but only publish the one that has value larger than zero.
   
   I think in the previous incarnation the code must have been dumping negative metrics or very large number for any ingestion delay not supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10418: Fix ramping delay caused by long lasting sequence of unfiltered messa…

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10418:
URL: https://github.com/apache/pinot/pull/10418#discussion_r1151422582


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/IngestionDelayTracker.java:
##########
@@ -214,12 +214,20 @@ public void updateIngestionDelay(long ingestionTimeMs, long firstStreamIngestion
       // Do not update the ingestion delay metrics during server startup period
       return;
     }
+    if ((ingestionTimeMs < 0) && (firstStreamIngestionTimeMs < 0)) {

Review Comment:
   ok it was not obvious reading just this snippet of code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org