You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2020/05/28 02:56:00 UTC

[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box

     [ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Chen updated FLINK-16495:
-------------------------------
    Fix Version/s:     (was: 1.11.0)
                   1.12.0

> Improve default flush strategy for Elasticsearch sink to make it work out-of-box
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-16495
>                 URL: https://issues.apache.org/jira/browse/FLINK-16495
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / ElasticSearch, Table SQL / Ecosystem
>            Reporter: Jark Wu
>            Priority: Major
>              Labels: usability
>             Fix For: 1.12.0
>
>
> Currently, Elasticsearch sink provides 3 flush options: 
> {code:java}
> 'connector.bulk-flush.max-actions' = '42'
> 'connector.bulk-flush.max-size' = '42 mb'
> 'connector.bulk-flush.interval' = '60000'
> {code}
> All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. 
> This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. 
> In this issue, I propose to have Flink's default values for these 3 options. 
> {code:java}
> 'connector.bulk-flush.max-actions' = '1000'   -- same to the ES client default value
> 'connector.bulk-flush.max-size' = '5mb'  -- same to the ES client default value
> 'connector.bulk-flush.interval' = '5s'  -- avoid no output result
> {code}
> [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356
> [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html
> [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)