You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "dang-stripe (via GitHub)" <gi...@apache.org> on 2023/06/29 03:57:38 UTC

[GitHub] [pinot] dang-stripe opened a new issue, #11001: optimize kafka `computePartitionGroupMetadata` with KafkaAdminClient

dang-stripe opened a new issue, #11001:
URL: https://github.com/apache/pinot/issues/11001

   we recently had some dns instability issues w/ our kafka cluster that caused consumer creation to fail since it couldn't resolve broker dns. while investigating, we noticed that we were creating N consumers every time a new realtime segment was created where N is the number of partitions on the topic. we have some topics w/ a high partition count like 200.
   
   ```
   [2023-06-27 02:50:30.459797] INFO [KafkaConsumer] [HelixTaskExecutor-message_handle_thread_27:17] [Consumer clientId=example_table-example_topic-5, groupId=pinot-table-group1] Subscribed to partition(s): example_topic-100
   [2023-06-27 02:50:30.459840] INFO [KafkaConsumer] [HelixTaskExecutor-message_handle_thread_27:17] [Consumer clientId=example_table-example_topic-5, groupId=pinot-table-group1] Subscribed to partition(s): example_topic-132
   ...
   ```
   
   it seems like the default implementation of [`computePartitionGroupMetadata`](https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMetadataProvider.java#L65) creates all these consumers. i'm wondering if there are any blockers to using the `KafkaAdminClient`'s listOffsets call to achieve the same thing: https://kafka.apache.org/28/javadoc/org/apache/kafka/clients/admin/KafkaAdminClient.html#listOffsets(java.util.Map,org.apache.kafka.clients.admin.ListOffsetsOptions)
   
   here is where it's getting called for new realtime segment creation: https://github.com/apache/pinot/blob/fc26d6d8975b4cd46e26e460236a30e8b1eb2cde/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java#L1547-L1549


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on issue #11001: optimize kafka `computePartitionGroupMetadata` with KafkaAdminClient

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on issue #11001:
URL: https://github.com/apache/pinot/issues/11001#issuecomment-1613195924

   @dang-stripe The number of consumers created in a single server is same as the number of partitions being consumed on the server. Additionally, it creates multiple short-lived consumer instances to fetch topic metadata, which is what you see in `_partitionMetadataProvider` . 
   This is purely due to how the code is organized in the kafka consumer plugin in Pinot (`KafkaStreamMetadataProvider` inherits from `KafkaPartitionLevelConnectionHandler`). I also mention it in this issue - https://github.com/apache/pinot/issues/10014#issue-1505401315 (see description Note)
   
   >  i'm wondering if there are any blockers to using the KafkaAdminClient's listOffsets call to achieve the same thing:
   
   I don't think there are any. `KafkaAdminClient` didn't exist or was not stable when Pinot's kafka consumer was written. We should replace it avoid all the short-lived consumers. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org