You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/03/03 22:17:54 UTC

[GitHub] [incubator-pinot] fx19880617 opened a new issue #6637: Kafka ingestion: Allow users to reset the offset to consume

fx19880617 opened a new issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637


   This can help users to skip the bad data due to upstream issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-790239001


   - It has to be stream agnostic, so we need to take a partition number (for now) and StreamMsgOffset (serialized)
   - We don't need to seal existing consuming segments. In fact, we cannot do that since they would have turned themselves OFFLINE having encountered the consumption error. We can start new consuming segments for the partitions provided, at the offsets provided.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-790119420


   This has to be done on a per-partition basis. A couple of ways to do this:
   - Provide a method to start the offset of the current consuming segment at a certain point. So, if a segment started consumption at offset 100, but had a problem at 120, the user can set the start offset to be (say) 125. In this case, there is loss of data from 100 to 120. Hopefully there are no bad offsets beyond 125.
   - Provide a method to skip certain offsets in a stream-partition. The user specifies an array of offsets that are to be skipped by the consumer. This is more complicated in terms of user-interface, but preserves maximum data possible.
   
   I expect the second option to be a bit more complex in implementation as well. We will need to consume each row and check whether it is taboo or not.
   
   I prefer the first approach -- big hammer but simple(r).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-790666889


   I suggest creating a new segment, with new metadata etc. having the new offset. Set it consuming state, and let the ball roll.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-791044746


   > I suggest creating a new segment, with new metadata etc. having the new offset. Set it consuming state, and let the ball roll.
   
   I see. Then how about the current consuming segments?
   
   There are two scenarios: 
   1. If the current segment is in ERROR status, then it will be in ERROR status forever
   2. If the current segment is in CONSUMING status, then two consuming segments will cause race conditions right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-790316018


   > * It has to be stream agnostic, so we need to take a partition number (for now) and StreamMsgOffset (serialized)
   True, I think it's fine for the API.
   
   > * We don't need to seal existing consuming segments. In fact, we cannot do that since they would have turned themselves OFFLINE having encountered the consumption error. We can start new consuming segments for the partitions provided, at the offsets provided.
   
   Do you mean we should just update the offset for the current consuming segment then restart the segment consumption(drop whatever consumed then restart from the given offset)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-791076286


   A CONSUMING segment can go into ERROR state only during the OFFLINE to CONSUMING state transition. Once it has transitioned, it can never go into ERROR state. It will turn itself into OFFLINE state if there are consumption issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6637:
URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-790210491


   Agreed with the first approach. Approach two assumes that the bad data are coming in normal traffic load, we observed sometimes that upstream data producers may dump a lot of data due to error, so the skipping option can handle both cases.
   
   In my view, this API should take a map of Kafka topic partitions to offsets mapping(no need for all the partitions), then seal all related partition segments then create new segments using the given start offsets.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org