You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Niels Basjes <Ni...@basjes.nl> on 2020/03/01 12:41:41 UTC

Writing a DataSet to ElasticSearch

Hi,

I have a job in Flink 1.10.0 which creates data that I need to write to
ElasticSearch.
Because it really is a Batch (and doing it as a stream keeps giving OOM
problems: big + unordered + groupby) I'm trying to do it as a real batch.

To write a DataSet to some output (that is not a file) an OutputFormat
implementation is needed.

public DataSink<T> output(OutputFormat<T> outputFormat)

The problem I have is that I have not been able to find a "OutputFormat"
for ElasticSearch.
Adding ES as a Sink to a DataStream is trivial because a Sink is provided
out of the box.

The only alternative I came up with is to write the output of my batch to a
file and then load that (with a stream) into ES.

What is the proper solution?
Is there an OutputFormat for ES I can use that I overlooked?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Writing a DataSet to ElasticSearch

Posted by Robert Metzger <rm...@apache.org>.
Hey Niels,

For the OOM problem: Did you try RocksDB?

I don't think there's an ES OutputFormat.

I guess there's no way around implementing your own OutputFormat for ES, if
you want to use the DataSet API. It should not be too hard to implement.


On Sun, Mar 1, 2020 at 1:42 PM Niels Basjes <Ni...@basjes.nl> wrote:

> Hi,
>
> I have a job in Flink 1.10.0 which creates data that I need to write to
> ElasticSearch.
> Because it really is a Batch (and doing it as a stream keeps giving OOM
> problems: big + unordered + groupby) I'm trying to do it as a real batch.
>
> To write a DataSet to some output (that is not a file) an OutputFormat
> implementation is needed.
>
> public DataSink<T> output(OutputFormat<T> outputFormat)
>
> The problem I have is that I have not been able to find a "OutputFormat"
> for ElasticSearch.
> Adding ES as a Sink to a DataStream is trivial because a Sink is provided
> out of the box.
>
> The only alternative I came up with is to write the output of my batch to
> a file and then load that (with a stream) into ES.
>
> What is the proper solution?
> Is there an OutputFormat for ES I can use that I overlooked?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>