You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dawid Wysakowicz (Jira)" <ji...@apache.org> on 2020/07/17 13:07:00 UTC

[jira] [Updated] (FLINK-18629) ConnectedStreams#keyBy can not derive key TypeInformation for lambda KeySelectors

     [ https://issues.apache.org/jira/browse/FLINK-18629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Wysakowicz updated FLINK-18629:
-------------------------------------
    Description: 
Following test fails:
{code}
	@Test
	public void testKeyedConnectedStreamsType() {
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
		DataStreamSource<Integer> stream2 = env.fromElements(1, 2);

		ConnectedStreams<Integer, Integer> connectedStreams = stream1.connect(stream2)
			.keyBy(v -> v, v -> v);

		KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) connectedStreams.getFirstInput();
		KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) connectedStreams.getSecondInput();
		assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
		assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
	}
{code}

The problem is that the wildcard type is evaluated as {{Object}} for lambdas, which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector provided as lambda.

I suggest changing the method signature to:
{code}
	public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
			KeySelector<IN1, K1> keySelector1,
			KeySelector<IN2, K2> keySelector2)
{code}

This would be a code compatible change. Might break the compatibility of state backend (would change derived key type info). 

Still there would be a workaround to use the second method for old programs:
{code}
	public <KEY> ConnectedStreams<IN1, IN2> keyBy(
			KeySelector<IN1, KEY> keySelector1,
			KeySelector<IN2, KEY> keySelector2,
			TypeInformation<KEY> keyType)
{code}

  was:
Following test fails:
{code}
	@Test
	public void testKeyedConnectedStreamsType() {
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
		DataStreamSource<Integer> stream2 = env.fromElements(1, 2);

		ConnectedStreams<Integer, Integer> connectedStreams = stream1.connect(stream2)
			.keyBy(v -> v, v -> v);

		KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) connectedStreams.getFirstInput();
		KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) connectedStreams.getSecondInput();
		assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
		assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
	}
{code}

The problem is that the wildcard type is evaluated as {{Object}} for lambdas, which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector provided as lambda.

I suggest changing the method signature to:
{code}
	public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
			KeySelector<IN1, K1> keySelector1,
			KeySelector<IN2, K2> keySelector2)
{code}

This would be a code compatible change. Might break the compatibility of state backend (would change derived key type info). Nevertheless there is a workaround to use:

{code}
	public <KEY> ConnectedStreams<IN1, IN2> keyBy(
			KeySelector<IN1, KEY> keySelector1,
			KeySelector<IN2, KEY> keySelector2,
			TypeInformation<KEY> keyType)
{code}


> ConnectedStreams#keyBy can not derive key TypeInformation for lambda KeySelectors
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-18629
>                 URL: https://issues.apache.org/jira/browse/FLINK-18629
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.10.0, 1.11.0, 1.12.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Critical
>             Fix For: 1.10.2, 1.12.0, 1.11.2
>
>
> Following test fails:
> {code}
> 	@Test
> 	public void testKeyedConnectedStreamsType() {
> 		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
> 		DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
> 		DataStreamSource<Integer> stream2 = env.fromElements(1, 2);
> 		ConnectedStreams<Integer, Integer> connectedStreams = stream1.connect(stream2)
> 			.keyBy(v -> v, v -> v);
> 		KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) connectedStreams.getFirstInput();
> 		KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) connectedStreams.getSecondInput();
> 		assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
> 		assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
> 	}
> {code}
> The problem is that the wildcard type is evaluated as {{Object}} for lambdas, which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector provided as lambda.
> I suggest changing the method signature to:
> {code}
> 	public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
> 			KeySelector<IN1, K1> keySelector1,
> 			KeySelector<IN2, K2> keySelector2)
> {code}
> This would be a code compatible change. Might break the compatibility of state backend (would change derived key type info). 
> Still there would be a workaround to use the second method for old programs:
> {code}
> 	public <KEY> ConnectedStreams<IN1, IN2> keyBy(
> 			KeySelector<IN1, KEY> keySelector1,
> 			KeySelector<IN2, KEY> keySelector2,
> 			TypeInformation<KEY> keyType)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)