You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/09 01:47:39 UTC

[GitHub] [druid] josephglanville opened a new pull request #11224: PulsarIndexingServer and PulsarSchemaRegistryBytesDecoder

josephglanville opened a new pull request #11224:
URL: https://github.com/apache/druid/pull/11224


   Implements an indexing service extension for reading from Apache Pulsar.
   
   Addtionally includes an implementation of `AvroBytesDecoder` that takes advantage of passing through `PulsarRecordEntity` to extract schema information such that messages can be automatically decoded into Avro `GenericRecord` types.
   
   Fixes #7030 
   
   ### Description
   
   
   ##### Key changed/added classes in this PR
    * `PulsarSchemaRegistryAvroBytesDecoder`
    * `PulsarSupervisor`
    * `PulsarIndexTask`
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] polyzos commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
polyzos commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-904477231


   @josephglanville yes it would be great if we could push this effort forward, i think it can unlock many use cases.
   Wondering what are the missing pieces and what needs to be done. Are there any points to reference?
   Won't have much bandwidth over the next weeks, but i want to try an allocate some time to better understand the work thats in place along with next steps.
   And maybe slowly piece by piece we can get this done 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-839598821


   @binlijin I see. Ok, I will make the required changes. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bananaaggle commented on pull request #11224: PulsarIndexingServer and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
bananaaggle commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-835696627


   > @bananaaggle I have invited you as a collaborator to my fork. I think my branch is currently a little more complete.
   > Should credit @rueian as I adapted some of their code as a starting point for my implementation.
   > 
   > The main things left to do are implement tests and do some real world testing. I think we can spin up in-process Pulsar clusters to do this similar to how testing is done in https://github.com/streamnative/mop and https://github.com/streamnative/kop which should allow for fairly robust tests.
   > 
   > Also need to work out correct place for the extension to live, I put it in extensions-core for now but perhaps it belongs in extensions-contrib?
   > 
   > It also isn't clear where `PulsarSchemaRegistryAvroBytesDecoder` should live, it could either live in `avro-extensions` as it does now or could be put in `pulsar-indexing-service`.
   
   I think PulsarSchemaRegistryAvroBytesDecoder should be separated into another commit. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingServer and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-835853615


   It's in a separate commit right now, if you mean a separate PR can certainly do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] binlijin commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
binlijin commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-836345010


   I think PartitionIdType/SequenceOffsetType for pulsar should be String/MessageId or String/String like in pr https://github.com/apache/druid/pull/11223 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-839403098


   @binlijin I considered that but using `MessageIDUtils.offset` seemed simpler. Is the motivation to use `MessageID` or `String` mostly to make it easier for human operators to inspect the current offsets? Are there any other benefits to this over the offset?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] polyzos commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
polyzos commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-899686129


   @josephglanville great job on this one.. im wondering why this work has stayed idle though and what needs to be done in order to push the effort forward. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bananaaggle commented on pull request #11224: PulsarIndexingServer and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
bananaaggle commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-835640389


   > @bananaaggle I noticed you opened #11223 today also. Perhaps we could merge our efforts?
   
   Yes. How can I do for that? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingServer and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-835622484


   @bananaaggle I noticed you opened #11223 today also. Perhaps we could merge our efforts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingServer and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-835654887


   @bananaaggle I have invited you as a collaborator to my fork. I think my branch is currently a little more complete.
   Should credit @rueian as I adapted some of their code as a starting point for my implementation.
   
   The main things left to do are implement tests and do some real world testing. I think we can spin up in-process Pulsar clusters to do this similar to how testing is done in https://github.com/streamnative/mop and https://github.com/streamnative/kop which should allow for fairly robust tests.
   
   Also need to work out correct place for the extension to live, I put it in extensions-core for now but perhaps it belongs in extensions-contrib?
   
   It also isn't clear where `PulsarSchemaRegistryAvroBytesDecoder` should live, it could either live in `avro-extensions` as it does now or could be put in `pulsar-indexing-service`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] binlijin commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
binlijin commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-839510544


   https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client/src/main/java/org/apache/pulsar/client/util/MessageIdUtils.java#L30
   
           // Combine ledger id and entry id to form offset
           // Use less than 32 bits to represent entry id since it will get
           // rolled over way before overflowing the max int range
   
   Because it can be overflow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-899952267


   @polyzos mostly just lack of time on my part. Happy to collaborate with anyone that has bandwidth though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on pull request #11224: PulsarIndexingService and PulsarSchemaRegistryBytesDecoder

Posted by GitBox <gi...@apache.org>.
josephglanville commented on pull request #11224:
URL: https://github.com/apache/druid/pull/11224#issuecomment-904635909


   @polyzos I'm active on the Pulsar and Apache Slack instances if you want to chat. I will probably be looping back to this shortly once I finish my current tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org