You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/02 05:36:43 UTC

[GitHub] [druid] jacobtolar opened a new pull request #11865: Add avro_ocf to supported Kafka/Kinesis InputFormats

jacobtolar opened a new pull request #11865:
URL: https://github.com/apache/druid/pull/11865


   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   <!-- Please read the doc for contribution (https://github.com/apache/druid/blob/master/CONTRIBUTING.md) before making this PR. Also, once you open a PR, please _avoid using force pushes and rebasing_ since these make it difficult for reviewers to see what you've changed in response to their reviews. See [the 'If your pull request shows conflicts with master' section](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master) for more details. -->
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   
   Update docs to add `avro_ocf` to list of supported input formats for Kafka/Kinesis. Also, updated Kinesis docs to more closely match Kafka (importing some of the changes from this PR: https://github.com/apache/druid/pull/11624/files).
   
   The `avro_ocf` input format was added here: https://github.com/apache/druid/pull/9671
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster - we have one datasource set up to ingest using `avro_ocf`, so I know that's working as documented here. I haven't tested with Kinesis but have no reason to believe it would not also work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #11865: Add avro_ocf to supported Kafka/Kinesis InputFormats

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #11865:
URL: https://github.com/apache/druid/pull/11865#issuecomment-985804768


   I'm a bit curious, [Avro OCF is a file format](https://avro.apache.org/docs/current/spec.html#Object+Container+Files), is it common to put these files in streaming ingest messages? There is no technical reason this wouldn't work if the files were small enough to fit in the messages since it is all just binary blobs in the end, but was mostly wondering if this is a common use case compared to the streaming oriented avro formats we support (inline schema, multi-inline-schema, schema repo, schema registry).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jacobtolar commented on pull request #11865: Add avro_ocf to supported Kafka/Kinesis InputFormats

Posted by GitBox <gi...@apache.org>.
jacobtolar commented on pull request #11865:
URL: https://github.com/apache/druid/pull/11865#issuecomment-985050495


   Ah, looks like a later PR (https://github.com/apache/druid/pull/11912) entirely reworked the Kafka ingestion docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] a2l007 commented on pull request #11865: Add avro_ocf to supported Kafka/Kinesis InputFormats

Posted by GitBox <gi...@apache.org>.
a2l007 commented on pull request #11865:
URL: https://github.com/apache/druid/pull/11865#issuecomment-985030140


   @jacobtolar LGTM. Could you please resolve the conflicts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] a2l007 merged pull request #11865: Add avro_ocf to supported Kafka/Kinesis InputFormats

Posted by GitBox <gi...@apache.org>.
a2l007 merged pull request #11865:
URL: https://github.com/apache/druid/pull/11865


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jacobtolar commented on pull request #11865: Add avro_ocf to supported Kafka/Kinesis InputFormats

Posted by GitBox <gi...@apache.org>.
jacobtolar commented on pull request #11865:
URL: https://github.com/apache/druid/pull/11865#issuecomment-985815091


   I don't know that it's a *common* use case...but we have some scenarios where we do this. There's obviously some overhead to providing the schema in every message (cost is amortized somewhat by providing many records in a single Kafka message), but it's nice not needing to have an extra component (schema registry).
   
   The avro_ocf support works right now by writing every message to a file on localhost...which isn't ideal for streaming in one 'file' per message (but technically works, if your disks are fast enough or your data volume is low enough 🙃). When I get some time I plan to submit a PR so you can configure that to happen in memory which should make it more usable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org