You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/04 09:47:42 UTC
[jira] [Commented] (FLINK-6215) Make the StatefulSequenceSource
scalable.
[ https://issues.apache.org/jira/browse/FLINK-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954925#comment-15954925 ]
ASF GitHub Bot commented on FLINK-6215:
---------------------------------------
GitHub user kl0u opened a pull request:
https://github.com/apache/flink/pull/3669
[FLINK-6215] Make the StatefulSequenceSource scalable.
So far this source was computing all the elements to
be emitted and stored them in memory. This could lead
to out-of-memory problems for large deployments. Now
we split the range of elements into partitions that
can be re-shuffled upon rescaling and we just store
the next offset and the end of each one of them upon
checkpointing.
The current version of the PR has no backwards compatibility,
as this becomes tricky given that we change the semantics
of the state that we store.
I believe that this is ok, given that it is a fix that has to go in
the 1.3 and we are not sure if people are actually using it in
production, i.e. in settings that need backwards compatibility.
What do you think @aljoscha @StephanEwen ?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kl0u/flink stateful-src
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3669.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3669
----
commit cf333b0b0c318569a1704ca71121c37dcd12bd3d
Author: kl0u <kk...@gmail.com>
Date: 2017-03-29T16:21:02Z
[FLINK-6215] Make the StatefulSequenceSource scalable.
So far this source was computing all the elements to
be emitted and stored them in memory. This could lead
to out-of-memory problems for large deployments. Now
we do split the range of elements into partitions that
can be re-shuffled upon rescaling and we just store
the next offset and the end of each one of them upon
checkpointing.
----
> Make the StatefulSequenceSource scalable.
> -----------------------------------------
>
> Key: FLINK-6215
> URL: https://issues.apache.org/jira/browse/FLINK-6215
> Project: Flink
> Issue Type: Bug
> Components: DataStream API
> Affects Versions: 1.3.0
> Reporter: Kostas Kloudas
> Assignee: Kostas Kloudas
> Fix For: 1.3.0
>
>
> Currently the {{StatefulSequenceSource}} instantiates all the elements to emit first and keeps them in memory. This is not scalable as for large sequences of elements this can lead to out of memory exceptions.
> To solve this, we can pre-partition the sequence of elements based on the {{maxParallelism}} parameter, and just keep state (to checkpoint) per such partition.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)