You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ben Edwards <be...@hypervolt.co.uk> on 2022/08/06 17:04:45 UTC

RichFunctions, streaming, and configuration (it's always empty)

I am new to flink and trying to write some unit tests for a RichFunction.
This function wants to find configuration passed in via the open method in
order to set up a network client. I am using a stream harness for my test,
customised with my own MockEnvironment + Configuration. To my surprise, the
configuration is always empty. So I did some reading and debugging and came
across this:

https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOperator.java#L100

Given that open is defined with no ability to pass in configuration in the
super class, it seems like there is no code path that ever injects anything
other than the empty configuration, which seems to render the configuration
completely useless. I am sure I have missed some other important part of
the api and would love some insight as to how to get my configuration
pushed down into my function. I note that the DataSet api has
withParameters, but afaict there is no such api for DataStream.

I haven't yet gone to production, but I am now worried that my entire plan
for configuration passing is completely suspect, and would love to hear
otherwise.

Ben

Re: RichFunctions, streaming, and configuration (it's always empty)

Posted by Ben Edwards <be...@hypervolt.co.uk>.
Hi David,

Thanks for confirming my research. Adding some more up to date
documentation to that function seems like an easy first contribution.

Best,
Ben

Re: RichFunctions, streaming, and configuration (it's always empty)

Posted by David Anderson <da...@apache.org>.
The configuration parameter passed to the open method is a legacy holdover
that has been retained to avoid breaking a public API, but is no longer
used.

Your options are to either get the global job parameters from the execution
context as described in [1], or to pass the configuration to a constructor
for your RichFunction (as described in [2]). Defining a constructor that
takes the configuration as a parameter is usually the preferred approach.

Cheers,
David

[1] https://stackoverflow.com/a/70273620/2000823
[2] https://stackoverflow.com/a/66909203/2000823

On Sat, Aug 6, 2022 at 12:52 PM Ben Edwards <be...@hypervolt.co.uk> wrote:

> A comment
> <https://issues.apache.org/jira/browse/FLINK-26587?focusedCommentId=17506187&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17506187>
> on this issue seems to suggest that the configuration parameter is legacy
> and I should getting the configuration off the runtime context, but there
> is no way to inspect the configuration from the runtime context, unless we
> are talking about the global job parameters on the execution context. Is
> that the "right" way to pass configuration?
>
> On Sat, Aug 6, 2022 at 6:04 PM Ben Edwards <be...@hypervolt.co.uk> wrote:
>
>> I am new to flink and trying to write some unit tests for a RichFunction.
>> This function wants to find configuration passed in via the open method in
>> order to set up a network client. I am using a stream harness for my test,
>> customised with my own MockEnvironment + Configuration. To my surprise, the
>> configuration is always empty. So I did some reading and debugging and came
>> across this:
>>
>>
>> https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOperator.java#L100
>>
>> Given that open is defined with no ability to pass in configuration in
>> the super class, it seems like there is no code path that ever injects
>> anything other than the empty configuration, which seems to render the
>> configuration completely useless. I am sure I have missed some other
>> important part of the api and would love some insight as to how to get my
>> configuration pushed down into my function. I note that the DataSet api has
>> withParameters, but afaict there is no such api for DataStream.
>>
>> I haven't yet gone to production, but I am now worried that my entire
>> plan for configuration passing is completely suspect, and would love to
>> hear otherwise.
>>
>> Ben
>>
>

Re: RichFunctions, streaming, and configuration (it's always empty)

Posted by Ben Edwards <be...@hypervolt.co.uk>.
A comment
<https://issues.apache.org/jira/browse/FLINK-26587?focusedCommentId=17506187&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17506187>
on this issue seems to suggest that the configuration parameter is legacy
and I should getting the configuration off the runtime context, but there
is no way to inspect the configuration from the runtime context, unless we
are talking about the global job parameters on the execution context. Is
that the "right" way to pass configuration?

On Sat, Aug 6, 2022 at 6:04 PM Ben Edwards <be...@hypervolt.co.uk> wrote:

> I am new to flink and trying to write some unit tests for a RichFunction.
> This function wants to find configuration passed in via the open method in
> order to set up a network client. I am using a stream harness for my test,
> customised with my own MockEnvironment + Configuration. To my surprise, the
> configuration is always empty. So I did some reading and debugging and came
> across this:
>
>
> https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOperator.java#L100
>
> Given that open is defined with no ability to pass in configuration in the
> super class, it seems like there is no code path that ever injects anything
> other than the empty configuration, which seems to render the configuration
> completely useless. I am sure I have missed some other important part of
> the api and would love some insight as to how to get my configuration
> pushed down into my function. I note that the DataSet api has
> withParameters, but afaict there is no such api for DataStream.
>
> I haven't yet gone to production, but I am now worried that my entire plan
> for configuration passing is completely suspect, and would love to hear
> otherwise.
>
> Ben
>