You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by aviad <ro...@gmail.com> on 2018/10/02 11:52:41 UTC
hadoopInputFormat and elasticsearch
Hi,
I want to write batch job which reads data from *elasticsearch* using
*elasticsearch-hadoop* (https://github.com/elastic/elasticsearch-hadoop/)
and *hadoopInputFormat*
example code (from
https://github.com/genged/flink-playground/blob/master/src/main/java/com/mic/flink/FlinkMain.java):
elasticsearch-hadoop creates one Hadoop InputSplit (tasks) per Elasticsearch
shard.
so if my index have 20 shards, it will be split to 20 InputSplit
/My question is:/
What will happen if my job restart (failover) after finishing half of the
InputSplit's ?
Does hadoopInputFormat remember which InputSplit are finished and knows how
to continue from where it stopped? (maybe read from beginning of unfinished
InputSplit? ) or it starts from the beginning?
thanks
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: hadoopInputFormat and elasticsearch
Posted by Andrey Zagrebin <an...@data-artisans.com>.
Hi,
At the moment if the processing of any data input split fails,
Flink will restart the batch job completely from scratch.
There is an ongoing effort to improve fine-grained recovery in FLINK-4256.
Best,
Andrey
> On 2 Oct 2018, at 13:52, aviad <ro...@gmail.com> wrote:
>
> Hi,
>
> I want to write batch job which reads data from *elasticsearch* using
> *elasticsearch-hadoop* (https://github.com/elastic/elasticsearch-hadoop/)
> and *hadoopInputFormat*
>
> example code (from
> https://github.com/genged/flink-playground/blob/master/src/main/java/com/mic/flink/FlinkMain.java):
>
>
>
> elasticsearch-hadoop creates one Hadoop InputSplit (tasks) per Elasticsearch
> shard.
> so if my index have 20 shards, it will be split to 20 InputSplit
>
>
> /My question is:/
> What will happen if my job restart (failover) after finishing half of the
> InputSplit's ?
> Does hadoopInputFormat remember which InputSplit are finished and knows how
> to continue from where it stopped? (maybe read from beginning of unfinished
> InputSplit? ) or it starts from the beginning?
>
> thanks
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/