You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vishal Santoshi <vi...@gmail.com> on 2019/03/18 16:31:47 UTC

Is a subtask index and kafka partition affinity retained after a restore ?

If a flink pipeline with a kafka source was to be re stored from a
checkpoint, would a partition be assigned to a the same sub task index ? It
does not seem so, but wanted to confirm. We are trying to simulate an edge
case where we have dangling files. We create these files by  forcing 2 or
more restores between 2 consecutive checkpoints. These files are created
after the first checkpoint but are rendered redundant by another restore
before the next checkpoint. We want to be sure that all data in these files
are accounted for by checking for the records in resolved files but were
not sure we should check the ones with the same task index or all sub
indexes...


for example, if we have 2 sub task index ids
if _in-progress-part-1-1 is  a dangling/redundant file would that data be
in part-1-2 ( the next finalized file but with the same sub task id ) or
could it be in part-2-2 plus for example ? If the partitions were to
maintain affinity, we would imagine that it would be former but wanted to
confirm.

Regards.

Re: Is a subtask index and kafka partition affinity retained after a restore ?

Posted by Vishal Santoshi <vi...@gmail.com>.
We are almost sure that there is no affinity.

On Mon, Mar 18, 2019, 12:31 PM Vishal Santoshi <vi...@gmail.com>
wrote:

> If a flink pipeline with a kafka source was to be re stored from a
> checkpoint, would a partition be assigned to a the same sub task index ? It
> does not seem so, but wanted to confirm. We are trying to simulate an edge
> case where we have dangling files. We create these files by  forcing 2 or
> more restores between 2 consecutive checkpoints. These files are created
> after the first checkpoint but are rendered redundant by another restore
> before the next checkpoint. We want to be sure that all data in these files
> are accounted for by checking for the records in resolved files but were
> not sure we should check the ones with the same task index or all sub
> indexes...
>
>
> for example, if we have 2 sub task index ids
> if _in-progress-part-1-1 is  a dangling/redundant file would that data be
> in part-1-2 ( the next finalized file but with the same sub task id ) or
> could it be in part-2-2 plus for example ? If the partitions were to
> maintain affinity, we would imagine that it would be former but wanted to
> confirm.
>
> Regards.
>