You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Mark Niehe <ma...@segment.com> on 2020/03/23 16:36:34 UTC

Lack of KeyedBroadcastStateBootstrapFunction

Hey all,

I have another question about the State Processor API. I can't seem to find
a way to create a KeyedBroadcastStateBootstrapFunction operator. The two
options currently available to bootstrap a savepoint with state are
KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because
these are the only two options, it's not possible to bootstrap both keyed
and broadcast state for the same operator. Are there any plans to add that
functionality or did I miss it entirely when going through the API docs?

Thanks,
-- 
<http://segment.com/>
Mark Niehe ·  Software Engineer
Integrations
<https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·  Blog
<https://segment.com/blog?utm_source=signature&utm_medium=email>  ·  We're
Hiring! <https://segment.com/jobs?utm_source=signature&utm_medium=email>

Re: Lack of KeyedBroadcastStateBootstrapFunction

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Thanks! Looking forward to that.

On Tue, Mar 31, 2020 at 1:02 AM Mark Niehe <ma...@segment.com> wrote:

> Hi Gordan and Seth,
>
> Thanks for explanation and opening up the ticket. I'll add some details in
> the ticket to explain what we're trying to do which will hopefully add some
> context.
>
> --
> <http://segment.com/>
> Mark Niehe ·  Software Engineer
> Integrations
> <https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·
> Blog <https://segment.com/blog?utm_source=signature&utm_medium=email>  ·  We're
> Hiring! <https://segment.com/jobs?utm_source=signature&utm_medium=email>
>
> On Mon, Mar 30, 2020 at 1:04 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
>
>> It seems like Seth's reply didn't make it to the mailing lists somehow.
>> Forwarding his reply below:
>>
>> ---------- Forwarded message ---------
>> From: Seth Wiesman <sj...@gmail.com>
>> Date: Thu, Mar 26, 2020 at 5:16 AM
>> Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction
>> To: Dawid Wysakowicz <dw...@apache.org>
>> Cc: <us...@flink.apache.org>, Tzu-Li (Gordon) Tai <tz...@apache.org>
>>
>>
>> As Dawid mentioned, you can implement your own operator using the
>> transform method to do this yourself. Unfortunately, that is fairly low
>> level and would require you to understand some flink amount internals.
>>
>> The real problem is that the state processor api does not support two
>> input operators. We originally skipped that because there were a number of
>> open questions about how best to do it and it wasn't clear that it would be
>> a necessary feature. Typically, flink users use two input operators to do
>> some sort of join. And when bootstrapping state, you typically only want to
>> pre-fill one side of that join. KeyedBroadcastState is clearly a good
>> counter-argument to that.
>>
>> I've opened a ticket for the feature if you would like to comment there.
>>
>> https://issues.apache.org/jira/browse/FLINK-16784
>>
>> On Tue, Mar 24, 2020 at 9:17 AM Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> I am not very familiar with the State Processor API, but from a brief
>>> look at it, I think you are right. I think the State Processor API does not
>>> support mixing different kinds of states in a single operator for now. At
>>> least not in a nice way. Probably you could implement the
>>> KeyedBroadcastStateBootstrapFunction yourself and us it with
>>> KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory).
>>> I understand this is probably not the easiest task.
>>>
>>> I am not aware if there are plans to support that out of the box, but I
>>> cc'ed Gordon and Seth who if I remember correctly worked on that API. I
>>> hope they might give you some more insights.
>>>
>>> Best,
>>>
>>> Dawid
>>>  On 23/03/2020 17:36, Mark Niehe wrote:
>>>
>>> Hey all,
>>>
>>> I have another question about the State Processor API. I can't seem to
>>> find a way to create a KeyedBroadcastStateBootstrapFunction operator. The
>>> two options currently available to bootstrap a savepoint with state are
>>> KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because
>>> these are the only two options, it's not possible to bootstrap both keyed
>>> and broadcast state for the same operator. Are there any plans to add that
>>> functionality or did I miss it entirely when going through the API docs?
>>>
>>> Thanks,
>>> --
>>> <http://segment.com/>
>>> Mark Niehe ·  Software Engineer
>>> Integrations
>>> <https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·
>>> Blog <https://segment.com/blog?utm_source=signature&utm_medium=email>
>>>   ·  We're Hiring!
>>> <https://segment.com/jobs?utm_source=signature&utm_medium=email>
>>>
>>>

Re: Lack of KeyedBroadcastStateBootstrapFunction

Posted by Mark Niehe <ma...@segment.com>.
Hi Gordan and Seth,

Thanks for explanation and opening up the ticket. I'll add some details in
the ticket to explain what we're trying to do which will hopefully add some
context.

-- 
<http://segment.com/>
Mark Niehe ·  Software Engineer
Integrations
<https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·  Blog
<https://segment.com/blog?utm_source=signature&utm_medium=email>  ·  We're
Hiring! <https://segment.com/jobs?utm_source=signature&utm_medium=email>

On Mon, Mar 30, 2020 at 1:04 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> It seems like Seth's reply didn't make it to the mailing lists somehow.
> Forwarding his reply below:
>
> ---------- Forwarded message ---------
> From: Seth Wiesman <sj...@gmail.com>
> Date: Thu, Mar 26, 2020 at 5:16 AM
> Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction
> To: Dawid Wysakowicz <dw...@apache.org>
> Cc: <us...@flink.apache.org>, Tzu-Li (Gordon) Tai <tz...@apache.org>
>
>
> As Dawid mentioned, you can implement your own operator using the
> transform method to do this yourself. Unfortunately, that is fairly low
> level and would require you to understand some flink amount internals.
>
> The real problem is that the state processor api does not support two
> input operators. We originally skipped that because there were a number of
> open questions about how best to do it and it wasn't clear that it would be
> a necessary feature. Typically, flink users use two input operators to do
> some sort of join. And when bootstrapping state, you typically only want to
> pre-fill one side of that join. KeyedBroadcastState is clearly a good
> counter-argument to that.
>
> I've opened a ticket for the feature if you would like to comment there.
>
> https://issues.apache.org/jira/browse/FLINK-16784
>
> On Tue, Mar 24, 2020 at 9:17 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hi,
>>
>> I am not very familiar with the State Processor API, but from a brief
>> look at it, I think you are right. I think the State Processor API does not
>> support mixing different kinds of states in a single operator for now. At
>> least not in a nice way. Probably you could implement the
>> KeyedBroadcastStateBootstrapFunction yourself and us it with
>> KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory).
>> I understand this is probably not the easiest task.
>>
>> I am not aware if there are plans to support that out of the box, but I
>> cc'ed Gordon and Seth who if I remember correctly worked on that API. I
>> hope they might give you some more insights.
>>
>> Best,
>>
>> Dawid
>>  On 23/03/2020 17:36, Mark Niehe wrote:
>>
>> Hey all,
>>
>> I have another question about the State Processor API. I can't seem to
>> find a way to create a KeyedBroadcastStateBootstrapFunction operator. The
>> two options currently available to bootstrap a savepoint with state are
>> KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because
>> these are the only two options, it's not possible to bootstrap both keyed
>> and broadcast state for the same operator. Are there any plans to add that
>> functionality or did I miss it entirely when going through the API docs?
>>
>> Thanks,
>> --
>> <http://segment.com/>
>> Mark Niehe ·  Software Engineer
>> Integrations
>> <https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·
>> Blog <https://segment.com/blog?utm_source=signature&utm_medium=email>
>>   ·  We're Hiring!
>> <https://segment.com/jobs?utm_source=signature&utm_medium=email>
>>
>>

Fwd: Lack of KeyedBroadcastStateBootstrapFunction

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
It seems like Seth's reply didn't make it to the mailing lists somehow.
Forwarding his reply below:

---------- Forwarded message ---------
From: Seth Wiesman <sj...@gmail.com>
Date: Thu, Mar 26, 2020 at 5:16 AM
Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction
To: Dawid Wysakowicz <dw...@apache.org>
Cc: <us...@flink.apache.org>, Tzu-Li (Gordon) Tai <tz...@apache.org>


As Dawid mentioned, you can implement your own operator using the transform
method to do this yourself. Unfortunately, that is fairly low level and
would require you to understand some flink amount internals.

The real problem is that the state processor api does not support two input
operators. We originally skipped that because there were a number of open
questions about how best to do it and it wasn't clear that it would be a
necessary feature. Typically, flink users use two input operators to do
some sort of join. And when bootstrapping state, you typically only want to
pre-fill one side of that join. KeyedBroadcastState is clearly a good
counter-argument to that.

I've opened a ticket for the feature if you would like to comment there.

https://issues.apache.org/jira/browse/FLINK-16784

On Tue, Mar 24, 2020 at 9:17 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi,
>
> I am not very familiar with the State Processor API, but from a brief look
> at it, I think you are right. I think the State Processor API does not
> support mixing different kinds of states in a single operator for now. At
> least not in a nice way. Probably you could implement the
> KeyedBroadcastStateBootstrapFunction yourself and us it with
> KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory).
> I understand this is probably not the easiest task.
>
> I am not aware if there are plans to support that out of the box, but I
> cc'ed Gordon and Seth who if I remember correctly worked on that API. I
> hope they might give you some more insights.
>
> Best,
>
> Dawid
>  On 23/03/2020 17:36, Mark Niehe wrote:
>
> Hey all,
>
> I have another question about the State Processor API. I can't seem to
> find a way to create a KeyedBroadcastStateBootstrapFunction operator. The
> two options currently available to bootstrap a savepoint with state are
> KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because
> these are the only two options, it's not possible to bootstrap both keyed
> and broadcast state for the same operator. Are there any plans to add that
> functionality or did I miss it entirely when going through the API docs?
>
> Thanks,
> --
> <http://segment.com/>
> Mark Niehe ·  Software Engineer
> Integrations
> <https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·
> Blog <https://segment.com/blog?utm_source=signature&utm_medium=email>  ·  We're
> Hiring! <https://segment.com/jobs?utm_source=signature&utm_medium=email>
>
>

Re: Lack of KeyedBroadcastStateBootstrapFunction

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi,

I am not very familiar with the State Processor API, but from a brief
look at it, I think you are right. I think the State Processor API does
not support mixing different kinds of states in a single operator for
now. At least not in a nice way. Probably you could implement the
KeyedBroadcastStateBootstrapFunction yourself and us it with
KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory).
I understand this is probably not the easiest task.

I am not aware if there are plans to support that out of the box, but I
cc'ed Gordon and Seth who if I remember correctly worked on that API. I
hope they might give you some more insights.

Best,

Dawid

 On 23/03/2020 17:36, Mark Niehe wrote:
> Hey all,
>
> I have another question about the State Processor API. I can't seem to
> find a way to create a KeyedBroadcastStateBootstrapFunction operator.
> The two options currently available to bootstrap a savepoint with
> state are KeyedStateBootstrapFunction and
> BroadcastStateBootstrapFunction. Because these are the only two
> options, it's not possible to bootstrap both keyed and broadcast state
> for the same operator. Are there any plans to add that functionality
> or did I miss it entirely when going through the API docs?
>
> Thanks,
> -- 
> <http://segment.com/>
> Mark Niehe ·  Software Engineer
> Integrations
> <https://segment.com/catalog?utm_source=signature&utm_medium=email>  ·  Blog
> <https://segment.com/blog?utm_source=signature&utm_medium=email>  ·  We're
> Hiring! <https://segment.com/jobs?utm_source=signature&utm_medium=email>