You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Curt Buechter <tr...@gmail.com> on 2021/09/03 17:58:03 UTC

pyflink table to datastream

I have a question about how the conversion from Table API to Datastream API
actually works under the covers.

If I have a Table API operation that creates a random id, like:

SELECT id, CAST(UUID() AS VARCHAR) as random_id FROM table

...then I convert this table to a datastream with

t_env.to_retract_stream(table, type_info)

will the UUID() function be re-evaluated on the datastream API?

My hope was that it wouldn't. It would take the data as it was in the Table
API, but what I am seeing is that I end up with two different random_id's.

Thanks!

Re: pyflink table to datastream

Posted by Caizhi Weng <ts...@gmail.com>.

Hi!

I don't quite understand this question, but I suppose you first run the
table program and then run the data stream program and you want the results
of the two programs to be identical?

If this is the case, the job will run twice as Flink will not cache the
result of a job, so in each run the UUID function will be evaluated
separately and will produce different results.

Curt Buechter <tr...@gmail.com> 于2021年9月4日周六 上午1:58写道：

> I have a question about how the conversion from Table API to Datastream
> API actually works under the covers.
>
> If I have a Table API operation that creates a random id, like:
>
> SELECT id, CAST(UUID() AS VARCHAR) as random_id FROM table
>
> ...then I convert this table to a datastream with
>
> t_env.to_retract_stream(table, type_info)
>
> will the UUID() function be re-evaluated on the datastream API?
>
> My hope was that it wouldn't. It would take the data as it was in the
> Table API, but what I am seeing is that I end up with two different
> random_id's.
>
> Thanks!
>