You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "navina (via GitHub)" <gi...@apache.org> on 2023/06/28 22:14:58 UTC

[GitHub] [pinot] navina opened a new issue, #10996: Questions about Stream level consumption model

navina opened a new issue, #10996:
URL: https://github.com/apache/pinot/issues/10996

   Issue is to open-up a discussion around:
   1. Why and in what use-cases, does it make sense to use the stream level consumption model in Pinot ?
   2. What are the semantics offered by the stream level consumption model. Eg. how does data from the source get partitioned into Pinot tables ? How is the consumption monitored in this model?  Iiuc, segment name convention is also different? 
   3.  Some feature differences I have noticed are (please correct, if I am mistaken). I am sure there are more. 
   
    Feature | HLC | LLC 
   ---|---|---
   Force commit | No | Yes 
   Stream Message metadata extraction | No (can potentially be extended) | Yes 
   Ingestion throttling | No | Yes
   
   4. Documentation is sparse about this usage and its guarantees. Iirc, there were a few examples in OSS documentation which used high level consumer. Users have mistakenly used these samples with `ConsumerType.HIGHLEVEL` and ended up in long debugging sessions. One example is https://apache-pinot.slack.com/archives/CDRCA57FC/p1687987849496959?thread_ts=1687912445.703689&cid=CDRCA57FC. (when the original incident happened, we spent ~1-2 days debugging before realizing that the stream type is high level)
   
   I would like to propose that we find a path to sunset the stream level consumption model. but I don't want to proceed without understanding the above questions. Please help clarify. 
   
   I also see comments like "This can be removed once we remove HLC implementation from the code" [link](https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/stream/PartitionLevelStreamConfig.java#L30)  . So, I am assuming this topic has come up before for discussion :) 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #10996: Questions about Stream level consumption model

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1612224205

   Proposal accepted to sunset stream level consumption. if it makes sense for some unpartitioned stream later, they can implement/resurrect things as needed (better start fresh implementation, since a lot has changed)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on issue #10996: Questions about Stream level consumption model

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1613817871

   Makes sense, @mcvsubbu 
   
   > Add some minor code so that use of HLC throws exception in controller and server (and even broker/minion if possible) during upgrade (important that this exception be thrown during the upgrade so that admins have a chance). If too many voices come up, then retract the exception code 
   
   what is the upgrade process that admins use? I am not able to figure out where the upgrade code  is or where the process is documented. Any pointers? 
   
   I will follow these steps:
   1. Right away, disable the HLC integrations tests and mark the Stream-level apis as deprecated with a note. 
   2. Send a poll out in OSS channels and mailing list about known users to speak up. 
   3. If there are no voices opposing the move until we cut the next release, then we can remove it right before the new release. I can have the patch ready to go. If we do hear opposition, I will save the patch to be applied for the next release `0.15.*` 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #10996: Questions about Stream level consumption model

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1612220585

   cc: @sajjad-moradi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on issue #10996: Questions about Stream level consumption model

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1614120990

   +100 on sunset HLC, I also suggest to remove it from the latest documentation, or just mark it as deprecated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #10996: Questions about Stream level consumption model

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1612227779

   On your questions:
   Stream level consumption basically involves a single server consuming all partitions of the stream. Therefore there are no guarantees that all replicas of the table (in realtime part) have the same data. Consequently, each replica independently "closes" the segment. Since the segments can have slightly different data, the segment name for each replica is different. Also, the "closed" segments are never uploaded to deep store. they are retained locally in the server. If the server goes down, then they need to start afresh (can't copy segments from some place else). \
   
   Each server registers with a different consumer ID. The consumer IDs were also stored in zookeeper some place (and there is code somewhere there for that).
   
   There were a lot of zookeeper watches on the controller. When each replica closed a segment, it would mark in segment zk metadata. controller watch would trigger, and it would create a new segment. 
   
   All this was an operational nightmare, and that is the reason we built LLC. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on issue #10996: Questions about Stream level consumption model

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1613207304

   > Proposal accepted to sunset stream level consumption. if it makes sense for some unpartitioned stream later, they can implement/resurrect things as needed (better start fresh implementation, since a lot has changed)
   
   Thanks for all the context, @mcvsubbu !  And I agree that a lot has changed and if necessary, we can redesign for any new unpartitioned stream. 
   
   Would the path forward be to mark the existing stream level APIs as deprecated for the upcoming release (`0.14.0`) and actually, remove the code in `0.15.0` ? Happy to open a g doc to facilitate more discussion on this topic. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on issue #10996: Questions about Stream level consumption model

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1612197810

   cc: @Jackie-Jiang @mayankshriv @npawar @mcvsubbu @yupeng9  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #10996: Questions about Stream level consumption model

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1612223488

   Related Issue #8804


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #10996: Questions about Stream level consumption model

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1613615894

   > > Proposal accepted to sunset stream level consumption. if it makes sense for some unpartitioned stream later, they can implement/resurrect things as needed (better start fresh implementation, since a lot has changed)
   > 
   > Thanks for all the context, @mcvsubbu ! And I agree that a lot has changed and if necessary, we can redesign for any new unpartitioned stream.
   > 
   > Would the path forward be to mark the existing stream level APIs as deprecated for the upcoming release (`0.14.0`) and actually, remove the code in `0.15.0` ? Happy to open a g doc to facilitate more discussion on this topic.
   
   The path you proposed, will certainly work.
   
   I may go even more aggressive. Ask in pinot-dev email as well as chat channels (since not many much activity in email lists) whether anyone is using HLC. My guess is that it will be a silence. In which case, we can actually remove code in he next release.
   
   Another idea:
   - Disable all HLC integration tests right away.
   - Add some minor code so that use of HLC throws exception in controller and server (and even broker/minion if possible) during upgrade (important that this exception be thrown during the upgrade so that admins have a chance). If too many voices come up, then retract the exception code, and follow your steps.
   - If nobody complains after a month after the release, make n announcement and start removing code 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #10996: Questions about Stream level consumption model

Posted by "mcvsubbu (via GitHub)" <gi...@apache.org>.
mcvsubbu commented on issue #10996:
URL: https://github.com/apache/pinot/issues/10996#issuecomment-1613845667

   Here is the process we recommend: https://docs.pinot.apache.org/operators/operating-pinot/upgrading-pinot-cluster
   
   So, as long as you add some code in the controller that detects any table in which HLC is used, and throws an exception, we are good. It may mean a one-time scan of all (realtime) table configs, but that may be fine. You can add this in the controller starter.
   
   Admins are supposed to upgrade controller first, and if it throws exception when the new version starts, we will hear it :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org