You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/10/09 00:42:56 UTC

[GitHub] b-slim commented on issue #6431: Add Kinesis Indexing Service to core Druid

b-slim commented on issue #6431: Add Kinesis Indexing Service to core Druid
URL: https://github.com/apache/incubator-druid/pull/6431#issuecomment-428022788
 
 
   I have looked at the abstractions/Api added by this PR and am not convinced that this is the best abstraction for  Streaming Ingest Task. 
   First when you read the code you can not really tell what is the semantic around T1 and T2 ? can this be any object really? or does it have to be comparable ? or it extend numbers ? IMO we need to clear that out. is it really true that every record is part of a partition and has a sequence number? 
   
   My biggest concerns is that this abstraction will only work with Kafka/Kinesis. For instance the proposed abstraction of Record as 
   ```java
   Record(String streamName, T1 partitionId, T2 sequenceNumber, List<byte[]> data)
   ```
   will not fit Both Pulsar and Pravega of of the box, as far i can tell there is no notion of partition in both system.
   
   Other question is that why the data is an ordered list of bytes ? does order really matter?
     
   Also This Pr models both the record Id and a position in the stream with the pair of  `<T1,T2>` which can be problematic as well. I think it is better to have an abstraction for record Id independent from a position in the stream
    
   Now i understand that this currently works nice with Kinesis and Kafka, but i am afraid that if we want to add support to another system will  lead to an entire code refactor -> PR with yet 16K line refactor. Thus i really recommend to have a proposal as a design review and then go from there. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org