You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "fml2 (Jira)" <ji...@apache.org> on 2021/02/19 17:43:00 UTC

[jira] [Comment Edited] (KAFKA-12328) Expose TaskId partition number

    [ https://issues.apache.org/jira/browse/KAFKA-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287103#comment-17287103 ] 

fml2 edited comment on KAFKA-12328 at 2/19/21, 5:42 PM:
--------------------------------------------------------

Hrm... It's weird, but:

My setup is: I have three partitions and two nodes (pods in a kubernetes cluster). Hence I expect that one node processes one partition and the other processes two partitions (or maybe one node gets all three of them, but it's unlikely).

The log messages are made from the punctuator via log.info("blah-blah" + context().taskId());  When the scheduled punctuation runs, I get
 * From one node, one log message with taskId=0_2
 * From the other node, _two_ log messages, _both_ with taskId=0_0
 * There is no log message for taskId=0_1 at all

Can you explain this?


was (Author: fml2):
Hrm... It's weird, but:

My setup is: I have three partitions and two nodes (pods in a kubernetes cluster). Hence I expect that one node processes one partion ond the other processes two partitions (or maybe one node gets all three of them, but it's unlikely).

The log messages are made from the punctuator via log.info("blah-blah" + context().taskId());  When the scheduled punctuation runs, I get
 * From one node, one log message with taskId=0_2
 * From the other node, _two_ log messages, _both_ with taskId=0_0
 * There is no log message for taskId=0_1 at all

Can you explain this?

> Expose TaskId partition number
> ------------------------------
>
>                 Key: KAFKA-12328
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12328
>             Project: Kafka
>          Issue Type: Wish
>          Components: streams
>            Reporter: fml2
>            Priority: Major
>              Labels: needs-kip
>
> This question was posted [on stakoverflow|https://stackoverflow.com/questions/66032099/kafka-streams-how-to-get-the-partition-an-iterartor-is-iterating-over] and got an answer but the solution is quite complicated hence this ticket.
>  
> In my Kafka Streams application, I have a task that sets up a scheduled (by the wall time) punctuator. The punctuator iterates over the entries of a store and does something with them. Like this:
> {code:java}
> var store = context().getStateStore("MyStore");
> var iter = store.all();
> while (iter.hasNext()) {
>    var entry = iter.next();
>    // ... do something with the entry
> }
> // Print a summary (now): N entries processed
> // Print a summary (wish): N entries processed in partition P
> {code}
> Is it possible to find out which partition the punctuator operates on? The java docs for {{ProcessorContext.partition()}} states that this method returns {{-1}} within punctuators.
> I've read [Kafka Streams: Punctuate vs Process|https://stackoverflow.com/questions/50776987/kafka-streams-punctuate-vs-process] and the answers there. I can understand that a task is, in general, not tied to a particular partition. But an iterator should be tied IMO.
> How can I find out the partition?
> Or is my assumption that a particular instance of a store iterator is tied to a partion wrong?
> What I need it for: I'd like to include the partition number in some log messages. For now, I have several nearly identical log messages stating that the punctuator does this and that. In order to make those messages "unique" I'd like to include the partition number into them.
> Since I'm working with a single store here (which might be partitioned), I assume that every single execution of the punctuator is bound to a single partition of that store.
>  
> It would be cool if there were a method {{iterator.partition}} (or similar) to get this information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)