You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/26 16:30:43 UTC

[GitHub] [incubator-pinot] kishoreg opened a new issue #5928: Add connector for Pravega

kishoreg opened a new issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928


   https://www.pravega.io/
   
   Similar to Kafka connector, it will be great to add a connector for Pravega. 
   
   Some info on [pravega vs Kafka](https://siliconangle.com/2017/04/17/dell-emc-takes-on-streaming-storage-with-open-source-solution-pravega-ffsf17/#:~:text=Kaitchuck%20outlined%20how%20Pravega%20differs,Pravega%20is%20a%20streaming%20system.&text=Pravega%20offers%20a%20system%20with,one%20or%20two%20big%20writes) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg edited a comment on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
kishoreg edited a comment on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-680990156


   There is some prior work we did to support other streaming systems (this was done keeping Kinesis in mind). 
   #5359 Extend offset support for other streams


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fpj commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
fpj commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-698823572


   Hi @npawar,  I missed your message, sorry about that. 
   
   > is "set of segments" like partitions within the stream?
   
   yes, but we do not expose segments individually. a group of readers coordinate among themselves to assign enabled segments. keep in mind that the set of segments in a pravega stream can change over time because of stream scaling.
   
    > Can we use them to instead add a PartitionLevelConsumer implementation?
   
   We can't with the stream API because of the reason I stated above. The Batch API, which does not follow any order of segments and simply gives iterators for a stream, can be used, is more aligned with what you are asking for, but it is not event driven. The iterators will read the events up to a point in the stream.
   
    > Is there any doc where i can read the detailed design of Pravega stream APIs and particularly the checkpointing?
   
   You may want to explore the documentation here:
   
   https://pravega.io/docs/latest/
   
   and in particular the "Developing Pravega Applications" section. You may also want to check the javadocs:
   
   https://pravega.io/docs/latest/javadoc/clients/io/pravega/client/stream/ReaderGroup.html#initiateCheckpoint-java.lang.String-java.util.concurrent.ScheduledExecutorService-
   
   The best example currently for the use of checkpoints is Apache Flink. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-680989410


   @mcvsubbu @npawar and I had an initial discussion around this topic but our understanding of Pravega is a bit weak to propose a complete solution. Flavio, it will be great if you can drive this discussion and guide us in the right direction. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-680989516


   @fpj


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fpj commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
fpj commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-698823572


   Hi @npawar,  I missed your message, sorry about that. 
   
   > is "set of segments" like partitions within the stream?
   
   yes, but we do not expose segments individually. a group of readers coordinate among themselves to assign enabled segments. keep in mind that the set of segments in a pravega stream can change over time because of stream scaling.
   
    > Can we use them to instead add a PartitionLevelConsumer implementation?
   
   We can't with the stream API because of the reason I stated above. The Batch API, which does not follow any order of segments and simply gives iterators for a stream, can be used, is more aligned with what you are asking for, but it is not event driven. The iterators will read the events up to a point in the stream.
   
    > Is there any doc where i can read the detailed design of Pravega stream APIs and particularly the checkpointing?
   
   You may want to explore the documentation here:
   
   https://pravega.io/docs/latest/
   
   and in particular the "Developing Pravega Applications" section. You may also want to check the javadocs:
   
   https://pravega.io/docs/latest/javadoc/clients/io/pravega/client/stream/ReaderGroup.html#initiateCheckpoint-java.lang.String-java.util.concurrent.ScheduledExecutorService-
   
   The best example currently for the use of checkpoints is Apache Flink. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-680990156


   There is some prior work we did to extend support for other streaming systems (this was done keeping Kinesis in mind). 
   #5359 Extend offset support for other streams


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-682738316


   @mcvsubbu @npawar @KKcorps thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fpj commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
fpj commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-681121098


   Thanks for starting this issue, @kishoreg. I'm wondering what would be the best way to approach the implementation of a connector. I started by looking at the spi interfaces, and in particular at the `StreamLevelConsumer`. Here is the reason.
   
   Pravega has as a few different APIs, the main one being the event stream API. it also has a batch API that enables unordered reads for parallelism, but let's focus on the event stream API for now. A Pravega stream comprises a set of parallel segments, and that set can change over time according to a scaling policy. We do not expose the complexity of dealing with the set of segments changing, and let the readers in a group coordinate internally the assignment of segments, respecting order. 
   
   Given that segments aren't clearly exposed in the event stream API, the `StreamLevelConsumer` interface seems to provide the right level of abstraction, except for the commit call. We don't really provide the ability to commit per reader, like Kafka provides the ability to commit per consumer. Pravega reader groups instead produce checkpoints, which are consistent collections of offsets for segments currently being read. The application sees checkpoints.
   
   Our approach to recording positions of one or more streams is consequently more coarse-grained and coordinated across the group. I wanted to understand how I can introduce checkpoints given that the interface expects a commit implementation.
   
   I'd love to get some input and hopefully some ideas. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on issue #5928: Add connector for Pravega

Posted by GitBox <gi...@apache.org>.
npawar commented on issue #5928:
URL: https://github.com/apache/incubator-pinot/issues/5928#issuecomment-690594852


   `A Pravega stream comprises a set of parallel segments, and that set can change over time according to a scaling policy` - is "set of segments" like partitions within the stream? Can we use them to instead add a PartitionLevelConsumer implementation? 
   The reason I ask is because we have not been steering away from StreamLevelConsumer for all practical purposes, because it is unfriendly for rebalance, capacity changes.
   Is there any doc where i can read the detailed design of Pravega stream APIs and particularly the checkpointing?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org