You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Elias Levy <fe...@gmail.com> on 2018/08/28 17:52:27 UTC

Why don't operations on KeyedStream return KeyedStream?

Operators on a KeyedStream don't return a new KeyedStream.  Is there a
reason for this?  You need to perform `keyBy` again to get a KeyedStream.
Presumably if you key by the same value there won't be any shuffled data,
but the key may no longer be available within the stream record.

Re: Why don't operations on KeyedStream return KeyedStream?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Elias,

Your assumption is correct. An operation on a KeyedStream results in a
regular DataStream because the operation might change the data type or the
key field.
Hence, it is not guaranteed that the same keys can be extracted from the
output of the keyed operation.

However, there is a way to convert a DataStream into a KeyedStream with
DataStreamUtils.reinterpretAsKeyedStream(DataStream, KeySelector).
This method returns a DataStream as KeyedStream without performing any
checks. You must ensure that the data is correctly partitioned. Note, that
it must be exactly the same partitioning that would be obtained by keyBy(),
i.e., it is not sufficient that the keys are partitioned in some way.

Best,
Fabian

Am Mi., 29. Aug. 2018 um 04:10 Uhr schrieb vino yang <yanghua1127@gmail.com
>:

> Hi Elias,
>
> Can you express this matter more clearly?
> The reason the KeyedStream object exists is that it needs to provide some
> different transform methods than the DataStream object.
> These transform methods are limited to keyBy.
> Why do you need to execute keyBy twice to get a KeyedStream object?
> You can define one and reuse it, can't you?
>
> Thanks, vino.
>
> Elias Levy <fe...@gmail.com> 于2018年8月29日周三 上午1:52写道:
>
>> Operators on a KeyedStream don't return a new KeyedStream.  Is there a
>> reason for this?  You need to perform `keyBy` again to get a KeyedStream.
>> Presumably if you key by the same value there won't be any shuffled data,
>> but the key may no longer be available within the stream record.
>>
>

Re: Why don't operations on KeyedStream return KeyedStream?

Posted by vino yang <ya...@gmail.com>.
Hi Elias,

Can you express this matter more clearly?
The reason the KeyedStream object exists is that it needs to provide some
different transform methods than the DataStream object.
These transform methods are limited to keyBy.
Why do you need to execute keyBy twice to get a KeyedStream object?
You can define one and reuse it, can't you?

Thanks, vino.

Elias Levy <fe...@gmail.com> 于2018年8月29日周三 上午1:52写道:

> Operators on a KeyedStream don't return a new KeyedStream.  Is there a
> reason for this?  You need to perform `keyBy` again to get a KeyedStream.
> Presumably if you key by the same value there won't be any shuffled data,
> but the key may no longer be available within the stream record.
>