You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/05/10 21:58:49 UTC

[GitHub] [incubator-pinot] mcvsubbu opened a new issue #5359: Extend support for other streams

mcvsubbu opened a new issue #5359:
URL: https://github.com/apache/incubator-pinot/issues/5359


   Pinot code stopped making reference to Kafka back in 0.1.0 days. HLC can support pretty much any stream. LLC it still uses one property of streams in the code in its raw form -- the offset of a stream message within a partition. 
   
   This is assumed to be a long (8 bytes). It appears as so in Segment ZK metadata, maintained as long in the stream consumers, and expected to be a `long` (primitive) in all the consuming interfaces.  
   
   This works fine with Kafka, Eventhub and such, but is not so with some of the other streams.
   
   We need to extend the code to support more generic offsets. The support for this has to be done somewhat carefully since it can break backward compatibility and cause production outage. It is better to do it in smaller steps, making sure that we are not breaking anything. 
   
   Offsets are NOT stored in on-disk segment metadata (good!)
   
   Broadly, the usage of offset is in these areas:
   
   1. The controller queries the stream's metadata to get the offset in each partition of the stream. The controller writes this offset into segment metadata as the starting offset of each realtime segment. Further, the controller also writes the offset into zk segment metadata when the segment completes.
   2. The server uses the offset to request the stream partition yo return messages starting with that offset.
   3. The server and controller exchange the offset value (as a long) in the segment completion protocol.
   
   The broad set of steps are as follows (but the devil is in the details, and we will know better as we move along):
   
   1. Change `long` into a class (`StreamPartitionMsgOffset`? -- must be `Comparable` and `Serializable`) in all places except Kafka-specific areas. For now, use LongOffset as the sub-class implementing this interface. Don't change any persistent code as yet. 
   2. Change stream consumer interface to support the `StreamPartitionMsgOffset` class instead of a long (both metadata fetcher and data consumer interfaces).
   3. Change the Segment Completion Protocol to add an additional serialized element into the protocol. Both controller and server will pay attention to the new element if it is present. Since we will be serializing LongOffset class, it should work well. The sender should include both raw form and serialized form in the protocol. The receiver chooses serialized if available, and falls back to raw if not.
   4. Change the segment metadata in zk to include serialized offset (in a new field). The deser will pick the serialized form if available, otherwise choose the long offset.
   5. Over time, remove the us of `long` in persistent data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] KKcorps commented on issue #5359: Extend offset support for other streams

Posted by GitBox <gi...@apache.org>.
KKcorps commented on issue #5359:
URL: https://github.com/apache/incubator-pinot/issues/5359#issuecomment-644944083


   @mcvsubbu The StreamPartitionMsgOffsetFactory interface expects a createMaxOffset method to be implemented. But for kinesis, the offsets are string and they can go out of bounds for Long or any other data type. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #5359: Extend offset support for other streams

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #5359:
URL: https://github.com/apache/incubator-pinot/issues/5359#issuecomment-643406985


   Keeping the issue open until @KKcorps verifies things


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu closed issue #5359: Extend offset support for other streams

Posted by GitBox <gi...@apache.org>.
mcvsubbu closed issue #5359:
URL: https://github.com/apache/incubator-pinot/issues/5359


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] KKcorps commented on issue #5359: Extend offset support for other streams

Posted by GitBox <gi...@apache.org>.
KKcorps commented on issue #5359:
URL: https://github.com/apache/incubator-pinot/issues/5359#issuecomment-644951219


   Checked it just now. Looks good to me!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #5359: Extend offset support for other streams

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #5359:
URL: https://github.com/apache/incubator-pinot/issues/5359#issuecomment-644949114


   @KKcorps I have removed that need. Please pull the latest code


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org