You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "liupengcheng (Jira)" <ji...@apache.org> on 2020/08/07 13:25:00 UTC

[jira] [Updated] (FLINK-18852) StreamScan should keep the same parallelism as the input

     [ https://issues.apache.org/jira/browse/FLINK-18852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liupengcheng updated FLINK-18852:
---------------------------------
    Description: 
Currently, the parallelism for StreamTableSourceScan/DataStreamScan is not inherited from  the upstream input, but retrieved from the config. I think this is unexpected.

I find this issue through UT, here is an example:

{code:java}

// env parallelism is set to 4
val env = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = StreamTableEnvironment.create(env)
    StreamITCase.testResults = new mutable.MutableList[String]
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    env.setParallelism(4)

// DataSource parallelism is set to 1
val table1 = env.fromCollection(left)
      .setParallelism(1)
      .assignTimestampsAndWatermarks(new TimestampAndWatermarkWithOffset[(Long, String)](0))
      .toTable(tEnv, 'a, 'b)
    val table2 = env.fromCollection(right)
      .setParallelism(1)
      .assignTimestampsAndWatermarks(new TimestampAndWatermarkWithOffset[(Long, String)](0))
      .toTable(tEnv, 'a, 'b)
{code}

But when you start the execution, and visualize the execution plan, you can find that the "from"(the StreamScan) operator's parallelism is 4.

  !image-2020-08-07-21-22-57-843.png! 


  was:
Currently, the parallelism for StreamTableSourceScan/DataStreamScan is not inherited from  the upstream input, but retrieved from the config. I think this is unexpected.

I find this issue through UT, here is an example:

{code:java}

// env parallelism is set to 4
val env = StreamExecutionEnvironment.getExecutionEnvironment
    val tEnv = StreamTableEnvironment.create(env)
    StreamITCase.testResults = new mutable.MutableList[String]
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    env.setParallelism(4)

// DataSource parallelism is set to 1
val table1 = env.fromCollection(left)
      .setParallelism(1)
      .assignTimestampsAndWatermarks(new TimestampAndWatermarkWithOffset[(Long, String)](0))
      .toTable(tEnv, 'a, 'b)
    val table2 = env.fromCollection(right)
      .setParallelism(1)
      .assignTimestampsAndWatermarks(new TimestampAndWatermarkWithOffset[(Long, String)](0))
      .toTable(tEnv, 'a, 'b)
{code}

But when you start the execution, and visualize the execution plan, you can find that the "from"(the StreamScan) operator's parallelism is 4.

 !image-2020-08-07-21-22-57-843.png|thumbnail! 



> StreamScan should keep the same parallelism as the input
> --------------------------------------------------------
>
>                 Key: FLINK-18852
>                 URL: https://issues.apache.org/jira/browse/FLINK-18852
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.11.1
>            Reporter: liupengcheng
>            Priority: Major
>         Attachments: image-2020-08-07-21-22-57-843.png
>
>
> Currently, the parallelism for StreamTableSourceScan/DataStreamScan is not inherited from  the upstream input, but retrieved from the config. I think this is unexpected.
> I find this issue through UT, here is an example:
> {code:java}
> // env parallelism is set to 4
> val env = StreamExecutionEnvironment.getExecutionEnvironment
>     val tEnv = StreamTableEnvironment.create(env)
>     StreamITCase.testResults = new mutable.MutableList[String]
>     env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>     env.setParallelism(4)
> // DataSource parallelism is set to 1
> val table1 = env.fromCollection(left)
>       .setParallelism(1)
>       .assignTimestampsAndWatermarks(new TimestampAndWatermarkWithOffset[(Long, String)](0))
>       .toTable(tEnv, 'a, 'b)
>     val table2 = env.fromCollection(right)
>       .setParallelism(1)
>       .assignTimestampsAndWatermarks(new TimestampAndWatermarkWithOffset[(Long, String)](0))
>       .toTable(tEnv, 'a, 'b)
> {code}
> But when you start the execution, and visualize the execution plan, you can find that the "from"(the StreamScan) operator's parallelism is 4.
>   !image-2020-08-07-21-22-57-843.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)