You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/12/01 02:41:01 UTC

[GitHub] [incubator-pinot] Aka-shi opened a new issue #6302: Support for pausing the realtime consumption without disabling the table.

Aka-shi opened a new issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302


   As of now, to pause the realtime consumption from kafka we have to disable the table. This also leads to the table not being available for querying. 
   
   It would be helpful if there is support for only stopping the realtime consumption while having the table available for querying. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-795726785


   @Eywek this seems to be a one-time operation as you describe it? Am I right? Once the data is loaded into pinot, you intend to shut off consumption and just query the data -- i.e. you do not expect more data to arrive in the realtime pipeline. Is that right? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Eywek edited a comment on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
Eywek edited a comment on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-795668118


   > Question for the community: Are there others who need this feature?
   
   👋&nbsp; yes. We're planning to use Pinot to store some data from our customers. We currently pull data from some APIs for our customers, transform it and store it. As this data can be quite large it means we need to stream it, and we need to be able to push this transformed data as a stream into pinot. To do this we plan to use a REALTIME table because we don't want to use Batch ingestion as it means that we need rely on a s3 bucket and we won't be able to know when the data is available for queries. 
   But when we've finished to pull data from those APIs, we won't update the table anymore, so being able to stop the ingestion but keep the query available could be really useful to avoid putting useless pressure on our Apache Pulsar cluster (we using [KoP](https://github.com/streamnative/kop) to be able to use Kafka ingestion plugin).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Eywek edited a comment on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
Eywek edited a comment on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-795668118


   > Question for the community: Are there others who need this feature?
   
   👋&nbsp; yes. We're planning to use Pinot to store some data from our customers. We currently pull data from some APIs for our customers, transform it and store it. As this data can be quite large it means we need to stream it, and we need to be able to push this transformed data as a stream into pinot. To do this we plan to use a REALTIME table because we don't want to use Batch ingestion as it means that we need rely on a s3 bucket and we won't be able to know when the data is available for queries. 
   But when we've finished to pull data from those APIs, we won't update the table anymore, so being able to stop the ingestion but keep the query available could be really useful to avoid putting useless pressure on our Apache Pulsar (we using [KoP](https://github.com/streamnative/kop) to be able to use Kafka ingestion plugin).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Eywek commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
Eywek commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-795668118


   > Question for the community: Are there others who need this feature?
   
   👋&nbsp; yes. We're planning to use Pinot to store some data from our users. We currently pull data from some APIs for our customers, transform it and store it. As this data can be quite large it means we need to stream it, and we need to be able to push this transformed data as a stream into pulsar. To do this we plan to use a REALTIME table because we don't want to use Batch ingestion as it means that we need rely on a s3 bucket and we won't be able to know when the data is available for queries. 
   But when we've finished to pull data from those APIs, we won't update the table anymore, so being able to stop the ingestion but keep the query available could be really useful to avoid putting useless pressure on our Apache Pulsar (we using [KoP](https://github.com/streamnative/kop) to be able to use Kafka ingestion plugin).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-777882960


   Two questions, @Aka-shi 
   1. After a pause, if the server is restarted, how do you desire the server should come back up? Should it consume up to the exact same paused place again? Should it simply not consume after the last completed/committed point? Should it forget that it was paused?
   2. If there are multiple replicas, then the above question becomes even harder since the behavior has to be co-ordinated across all replicas. I suppose it is not OK for one replica to forget the pause, and the other replica to still be paused.
   3. Further if there are multiple replicas, do you want them all to pause at the same place after a pause command?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-777887740


   Other questions:
   1. Would you want all partitions of the stream to pause , or just some specific partitions?
   2. Would you want all partitions/tables in a given server to pause?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Aka-shi commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
Aka-shi commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-778922524


   @mcvsubbu 
   >Would you want all partitions of the stream to pause , or just some specific partitions??
   
   For all partitions. The consumption for the table itself should be stopped. I was thinking something like this. 
   1. User pauses stream -> pinot server commits the current consuming segments(for all partitions) along with offsets. 
   2. User resumes stream -> Pinot starts consumption from all partitions from previously committed offsets. 
   
   >Would you want all partitions/tables in a given server to pause?
   
   I was expecting the pause option to be at a table level. Because of this #6555 . If the pause and reset APIs are available at a table level, then the user can pause the current stream(which would commit current segments and pause the stream), reset the offsets, and resume consumption  from earliest/latest offsets as per config. 
   
   >After a pause, if the server is restarted, how do you desire the server should come back up? Should it consume up to the exact same paused place again? Should it simply not consume after the last completed/committed point? Should it forget that it was paused?
   
   If a table is paused and server restarts after it, then considering the previous consuming segments were already committed when the table is paused, I would expect the server to not consume after the restart too. My understanding is when we pause the stream, we are changing the state of the table and it should not start consuming until the user resumes the stream himself. 
   
   >Further if there are multiple replicas, do you want them all to pause at the same place after a pause command?
   
   Yes. That's what I would expect when I pause the stream. No replica of any partition consumer should be active is what I feel. 
   
   PS: Just putting it out here. When we pause the table, the table should still be available for querying the already consumed data. If not, this would be more or less like the enable/disable API. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-779426187


   Thanks for clarifying. So, you are not really looking for a "pause" (which I assumed to mean pause without committing segments). You are looking for a "commit now and do not consume until further instructions".
   
   Last question (should have been the first). What is the use case for this?
   
   Question for the community: Are there others who need this feature?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-777883309


   @snleee , @sajjad-moradi  if you have other questions/thoughts please chime in


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Eywek commented on issue #6302: [pinot]Support for pausing the realtime consumption without disabling the table.

Posted by GitBox <gi...@apache.org>.
Eywek commented on issue #6302:
URL: https://github.com/apache/incubator-pinot/issues/6302#issuecomment-795729348


   yep exactly


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org