You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Dong Lin (Jira)" <ji...@apache.org> on 2022/12/29 08:00:00 UTC

[jira] [Created] (FLINK-30533) IteratorSourceReaderBase#pollNext() should push records to ReaderOutput in a while loop

Dong Lin created FLINK-30533:
--------------------------------

             Summary: IteratorSourceReaderBase#pollNext() should push records to ReaderOutput in a while loop
                 Key: FLINK-30533
                 URL: https://issues.apache.org/jira/browse/FLINK-30533
             Project: Flink
          Issue Type: Sub-task
            Reporter: Dong Lin


Currently, each invocation of IteratorSourceReaderBase#pollNext() push at most one record to the given ReaderOutput. This unnecessarily increases the average Java call stack depth needed to produce an element.

 

Take the following program as an example. For each element produced by this program, Flink runtime needs to include in the call stack these 4 function calls:
 * StreamTask#processInput()
 * StreamOneInputProcessor#processInput()
 * StreamTaskSourceInput#emitNext()
 * SourceOperator#emitNext()

{code:java}
DataStream<Long> stream = env.fromSequence(1, 1000000000L)
.map(x -> x)
.addSink(new DiscardingSink<>());
{code}
 

In comparison, SourceReaderBase#pollNext() is already using a while loop so that each invocation of this method could push as many records to the given ReaderOutput as possible.

 

This ticket proposes to update IteratorSourceReaderBase#pollNext() to push records to ReaderOutput in a while loop, for the following two reason:
 * It improves performance for programs that IteratorSourceReaderBase (e.g. env.fromSequence) by removing an average of 4 function from the call stack needed to produce a record.
 * It makes the behavior of IteratorSourceReaderBase and SourceReaderBase consistent with each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)