You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Lijie Wang (Jira)" <ji...@apache.org> on 2022/03/10 08:12:00 UTC
[jira] [Updated] (FLINK-26576) The value of 'readerParallelism' passed to ContinuousFileMonitoringFunction is wrong
[ https://issues.apache.org/jira/browse/FLINK-26576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lijie Wang updated FLINK-26576:
-------------------------------
Description:
In [StreamExecutionEnvironment#createFileInput |https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#:~:text=inputFormat%2C%20monitoringMode%2C-,getParallelism(),-%2C%20interval)%3B], the {{env.getParallelism()}} was passed to {{ContinuousFileMonitoringFunction}} as the parallelism of downstream readers. This value is incorrect when the parallelism of the downstream readers is manually configured by the user.
For example, in the test below, *1* will be passed as {{{}readerParallelism{}}}, but the actual parallelism of readers is {*}5{*}.
{code:java}
@Test
public void testContinuousFileMonitoringFunction() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(1);
final String fileContent = "line1\n" + "line2\n" + "line3\n" + "line4\n" + "line5\n";
final File file = createTempFile(fileContent);
env.readTextFile(file.getPath()).name("TextSource").setParallelism(5)
.forward()
.addSink(new PrintSinkFunction<>()).setParallelism(5);
env.execute();
}
private File createTempFile(String content) throws IOException {
File tempFile = File.createTempFile("test_contents", "tmp");
tempFile.deleteOnExit();
OutputStreamWriter wrt =
new OutputStreamWriter(new FileOutputStream(tempFile), StandardCharsets.UTF_8);
wrt.write(content);
wrt.close();
return tempFile;
}
{code}
was:
In [StreamExecutionEnvironment#createFileInput |https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#:~:text=inputFormat%2C%20monitoringMode%2C-,getParallelism(),-%2C%20interval)%3B], the {{env.getParallelism()}} was passed to {{ContinuousFileMonitoringFunction}} as the parallelism of downstream readers. This value is incorrect when the parallelism of the downstream readers is manually configured by the user.
For example, in the test below, *1* will be passed as {{readerParallelism}}, but the actual parallelism of readers is *5*.
{code:java}
// Some comments here
@Test
public void testContinuousFileMonitoringFunction() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(1);
final String fileContent = "line1\n" + "line2\n" + "line3\n" + "line4\n" + "line5\n";
final File file = createTempFile(fileContent);
env.readTextFile(file.getPath()).name("TextSource").setParallelism(5)
.forward()
.addSink(new PrintSinkFunction<>()).setParallelism(5);
env.execute();
}
private File createTempFile(String content) throws IOException {
File tempFile = File.createTempFile("test_contents", "tmp");
tempFile.deleteOnExit();
OutputStreamWriter wrt =
new OutputStreamWriter(new FileOutputStream(tempFile), StandardCharsets.UTF_8);
wrt.write(content);
wrt.close();
return tempFile;
}
{code}
> The value of 'readerParallelism' passed to ContinuousFileMonitoringFunction is wrong
> ------------------------------------------------------------------------------------
>
> Key: FLINK-26576
> URL: https://issues.apache.org/jira/browse/FLINK-26576
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Reporter: Lijie Wang
> Priority: Major
>
> In [StreamExecutionEnvironment#createFileInput |https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#:~:text=inputFormat%2C%20monitoringMode%2C-,getParallelism(),-%2C%20interval)%3B], the {{env.getParallelism()}} was passed to {{ContinuousFileMonitoringFunction}} as the parallelism of downstream readers. This value is incorrect when the parallelism of the downstream readers is manually configured by the user.
> For example, in the test below, *1* will be passed as {{{}readerParallelism{}}}, but the actual parallelism of readers is {*}5{*}.
> {code:java}
> @Test
> public void testContinuousFileMonitoringFunction() throws Exception {
> StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(1);
> final String fileContent = "line1\n" + "line2\n" + "line3\n" + "line4\n" + "line5\n";
> final File file = createTempFile(fileContent);
> env.readTextFile(file.getPath()).name("TextSource").setParallelism(5)
> .forward()
> .addSink(new PrintSinkFunction<>()).setParallelism(5);
> env.execute();
> }
> private File createTempFile(String content) throws IOException {
> File tempFile = File.createTempFile("test_contents", "tmp");
> tempFile.deleteOnExit();
> OutputStreamWriter wrt =
> new OutputStreamWriter(new FileOutputStream(tempFile), StandardCharsets.UTF_8);
> wrt.write(content);
> wrt.close();
> return tempFile;
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)