You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Ramesh Nethi <ra...@gmail.com> on 2019/01/01 07:20:39 UTC

Re: KafkaIO and added partitions

+1 for this capability.  This would enable pipelines to continue to run
when such changes need to be made.

regards
Ramesh

On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <ra...@google.com> wrote:

> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <ra...@google.com> wrote:
>
>> - New partitions will be ignored during runtime.
>> - Update will not succeed either. Error message on the workers should
>> explain the mismatch.
>>
>
> This is the current state. Supporting changes to number of partition is
> quite doable if there is enough user interested (even in the current
> UnnoundedSource API framework).
>
>>
>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jo...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> just wanted to check how does Beam KafkaIO behaves when partitions are
>>> added to the topic.
>>> Will they be picked up or ignored during the runtime?
>>> Will they be picked up on restart with state restore?
>>>
>>> Thanks,
>>> Jozef
>>>
>>

Re: KafkaIO and added partitions

Posted by Raghu Angadi <ra...@google.com>.
+1, we should do it.
The implementation could be something on these line:

   - While assigning Kafka partitions to each source split during the first
   run, assign them deterministically.
      - Current round-robin assignment works fine for single topic. But is
      not deterministic while reading from more than one topic. We
need to tweak
      the assignment to work well in that case.
   - On the worker, each reader should check the partitions for input topic
   (this can be part of existing periodic threads that checks backlog)
   - When partitions are added:
      - The readers (source splits) that new partitions belong to will
      start consuming from it. This is straight forward.
      - What if the new partition's watermark is older the current
      watermark? Can't do much about it since a watermark can not go back.
   - When the partitions are deleted:
      - This is a bit more tricky.
      - We need to handle the case a source split might not have any
      partitions assigned.
         - What should the watermark be? I think current wall time makes
         sense. Note that there could be new partitions added later.


On Wed, Jan 2, 2019 at 7:59 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> I just wanted to mention that there is quite old open issue about that:
> https://issues.apache.org/jira/browse/BEAM-727
>
>  Fell free to take this one if anyone is interested.
>
> On 2 Jan 2019, at 15:22, Juan Carlos Garcia <jc...@gmail.com> wrote:
>
> +1
>
> Am Mi., 2. Jan. 2019, 14:34 hat Abdul Qadeer <qu...@gmail.com>
> geschrieben:
>
>> +1
>>
>> On Tue, 1 Jan 2019 at 12:45, <ja...@gmail.com> wrote:
>>
>>> +1 from my side too :-)
>>> And ideally I would want to have some hooks to let me know the extra
>>> partitions have been picked up (or a way to query it).
>>>
>>> Although if that can't be provided I can work around it myself by
>>> sending some specific message to the partition that somewhere results in a
>>> visible state change in the pipeline.
>>>
>>> Also, as a quick (semi related) heads up: I will very likely soon
>>> contribute a change to the LogAppendTimePolicy so that the idle partition
>>> behavior (automatic watermark generation) can be disabled.
>>>
>>> (of course all related to my streamy-db project)
>>>
>>> Kind regards,
>>> Jan
>>>
>>>
>>> On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ra...@gmail.com>
>>> wrote:
>>>
>>>> +1 for this capability.  This would enable pipelines to continue to run
>>>> when such changes need to be made.
>>>>
>>>> regards
>>>> Ramesh
>>>>
>>>> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <ra...@google.com> wrote:
>>>>
>>>>> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <ra...@google.com>
>>>>> wrote:
>>>>>
>>>>>> - New partitions will be ignored during runtime.
>>>>>> - Update will not succeed either. Error message on the workers should
>>>>>> explain the mismatch.
>>>>>>
>>>>>
>>>>> This is the current state. Supporting changes to number of partition
>>>>> is quite doable if there is enough user interested (even in the current
>>>>> UnnoundedSource API framework).
>>>>>
>>>>>>
>>>>>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jo...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> just wanted to check how does Beam KafkaIO behaves when partitions
>>>>>>> are added to the topic.
>>>>>>> Will they be picked up or ignored during the runtime?
>>>>>>> Will they be picked up on restart with state restore?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jozef
>>>>>>>
>>>>>>
>

Re: KafkaIO and added partitions

Posted by Alexey Romanenko <ar...@gmail.com>.
I just wanted to mention that there is quite old open issue about that:
https://issues.apache.org/jira/browse/BEAM-727 <https://issues.apache.org/jira/browse/BEAM-727>

 Fell free to take this one if anyone is interested.

> On 2 Jan 2019, at 15:22, Juan Carlos Garcia <jc...@gmail.com> wrote:
> 
> +1
> 
> Am Mi., 2. Jan. 2019, 14:34 hat Abdul Qadeer <quadeer.leo@gmail.com <ma...@gmail.com>> geschrieben:
> +1
> 
> On Tue, 1 Jan 2019 at 12:45, <jan.doms@gmail.com <ma...@gmail.com>> wrote:
> +1 from my side too :-)
> And ideally I would want to have some hooks to let me know the extra partitions have been picked up (or a way to query it).
> 
> Although if that can't be provided I can work around it myself by sending some specific message to the partition that somewhere results in a visible state change in the pipeline.
> 
> Also, as a quick (semi related) heads up: I will very likely soon contribute a change to the LogAppendTimePolicy so that the idle partition behavior (automatic watermark generation) can be disabled.
> 
> (of course all related to my streamy-db project)
> 
> Kind regards,
> Jan
> 
> 
> On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ramesh.nethi@gmail.com <ma...@gmail.com>> wrote:
> +1 for this capability.  This would enable pipelines to continue to run when such changes need to be made.
> 
> regards
> Ramesh
> 
> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <rangadi@google.com <ma...@google.com>> wrote:
> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <rangadi@google.com <ma...@google.com>> wrote:
> - New partitions will be ignored during runtime. 
> - Update will not succeed either. Error message on the workers should explain the mismatch.
> 
> This is the current state. Supporting changes to number of partition is quite doable if there is enough user interested (even in the current UnnoundedSource API framework). 
> 
> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jozo.vilcek@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> just wanted to check how does Beam KafkaIO behaves when partitions are added to the topic. 
> Will they be picked up or ignored during the runtime? 
> Will they be picked up on restart with state restore?
> 
> Thanks,
> Jozef


Re: KafkaIO and added partitions

Posted by Juan Carlos Garcia <jc...@gmail.com>.
+1

Am Mi., 2. Jan. 2019, 14:34 hat Abdul Qadeer <qu...@gmail.com>
geschrieben:

> +1
>
> On Tue, 1 Jan 2019 at 12:45, <ja...@gmail.com> wrote:
>
>> +1 from my side too :-)
>> And ideally I would want to have some hooks to let me know the extra
>> partitions have been picked up (or a way to query it).
>>
>> Although if that can't be provided I can work around it myself by sending
>> some specific message to the partition that somewhere results in a visible
>> state change in the pipeline.
>>
>> Also, as a quick (semi related) heads up: I will very likely soon
>> contribute a change to the LogAppendTimePolicy so that the idle partition
>> behavior (automatic watermark generation) can be disabled.
>>
>> (of course all related to my streamy-db project)
>>
>> Kind regards,
>> Jan
>>
>>
>> On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ra...@gmail.com> wrote:
>>
>>> +1 for this capability.  This would enable pipelines to continue to run
>>> when such changes need to be made.
>>>
>>> regards
>>> Ramesh
>>>
>>> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <ra...@google.com> wrote:
>>>
>>>> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <ra...@google.com>
>>>> wrote:
>>>>
>>>>> - New partitions will be ignored during runtime.
>>>>> - Update will not succeed either. Error message on the workers should
>>>>> explain the mismatch.
>>>>>
>>>>
>>>> This is the current state. Supporting changes to number of partition is
>>>> quite doable if there is enough user interested (even in the current
>>>> UnnoundedSource API framework).
>>>>
>>>>>
>>>>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> just wanted to check how does Beam KafkaIO behaves when partitions
>>>>>> are added to the topic.
>>>>>> Will they be picked up or ignored during the runtime?
>>>>>> Will they be picked up on restart with state restore?
>>>>>>
>>>>>> Thanks,
>>>>>> Jozef
>>>>>>
>>>>>

Re: KafkaIO and added partitions

Posted by Abdul Qadeer <qu...@gmail.com>.
+1

On Tue, 1 Jan 2019 at 12:45, <ja...@gmail.com> wrote:

> +1 from my side too :-)
> And ideally I would want to have some hooks to let me know the extra
> partitions have been picked up (or a way to query it).
>
> Although if that can't be provided I can work around it myself by sending
> some specific message to the partition that somewhere results in a visible
> state change in the pipeline.
>
> Also, as a quick (semi related) heads up: I will very likely soon
> contribute a change to the LogAppendTimePolicy so that the idle partition
> behavior (automatic watermark generation) can be disabled.
>
> (of course all related to my streamy-db project)
>
> Kind regards,
> Jan
>
>
> On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ra...@gmail.com> wrote:
>
>> +1 for this capability.  This would enable pipelines to continue to run
>> when such changes need to be made.
>>
>> regards
>> Ramesh
>>
>> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <ra...@google.com> wrote:
>>
>>> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <ra...@google.com>
>>> wrote:
>>>
>>>> - New partitions will be ignored during runtime.
>>>> - Update will not succeed either. Error message on the workers should
>>>> explain the mismatch.
>>>>
>>>
>>> This is the current state. Supporting changes to number of partition is
>>> quite doable if there is enough user interested (even in the current
>>> UnnoundedSource API framework).
>>>
>>>>
>>>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> just wanted to check how does Beam KafkaIO behaves when partitions are
>>>>> added to the topic.
>>>>> Will they be picked up or ignored during the runtime?
>>>>> Will they be picked up on restart with state restore?
>>>>>
>>>>> Thanks,
>>>>> Jozef
>>>>>
>>>>

Re: KafkaIO and added partitions

Posted by ja...@gmail.com.
+1 from my side too :-)
And ideally I would want to have some hooks to let me know the extra
partitions have been picked up (or a way to query it).

Although if that can't be provided I can work around it myself by sending
some specific message to the partition that somewhere results in a visible
state change in the pipeline.

Also, as a quick (semi related) heads up: I will very likely soon
contribute a change to the LogAppendTimePolicy so that the idle partition
behavior (automatic watermark generation) can be disabled.

(of course all related to my streamy-db project)

Kind regards,
Jan


On Tue, 1 Jan 2019 at 08:19, Ramesh Nethi <ra...@gmail.com> wrote:

> +1 for this capability.  This would enable pipelines to continue to run
> when such changes need to be made.
>
> regards
> Ramesh
>
> On Fri, 23 Nov 2018 at 00:40 Raghu Angadi <ra...@google.com> wrote:
>
>> On Thu, Nov 22, 2018 at 10:10 AM Raghu Angadi <ra...@google.com> wrote:
>>
>>> - New partitions will be ignored during runtime.
>>> - Update will not succeed either. Error message on the workers should
>>> explain the mismatch.
>>>
>>
>> This is the current state. Supporting changes to number of partition is
>> quite doable if there is enough user interested (even in the current
>> UnnoundedSource API framework).
>>
>>>
>>> On Thu, Nov 22, 2018 at 2:15 AM Jozef Vilcek <jo...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>> just wanted to check how does Beam KafkaIO behaves when partitions are
>>>> added to the topic.
>>>> Will they be picked up or ignored during the runtime?
>>>> Will they be picked up on restart with state restore?
>>>>
>>>> Thanks,
>>>> Jozef
>>>>
>>>