You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Tim Robertson (JIRA)" <ji...@apache.org> on 2018/05/23 09:54:00 UTC

[jira] [Created] (BEAM-4389) Enable partial updates for Elasticsearch

Tim Robertson created BEAM-4389:
-----------------------------------

             Summary: Enable partial updates for Elasticsearch
                 Key: BEAM-4389
                 URL: https://issues.apache.org/jira/browse/BEAM-4389
             Project: Beam
          Issue Type: New Feature
          Components: io-java-elasticsearch
    Affects Versions: 2.4.0
            Reporter: Tim Robertson
            Assignee: Tim Robertson


Expose a configuration option on the {{ElasticsearchIO}} to enable partial updates rather than full document inserts. 

Rationale: We have the case where different pipelines process different categories of information of the target entity (e.g. one for taxonomic processing, another for geospatial processing). A read and merge is not possible inside the batch call, meaning the only way to do it is through a join. The join approach is slow, and also stops the ability to run a single process in isolation (e.g. reprocess the geospatial component of all docs).

Use of this configuration parameter has to be used in conjunction with controlling the document ID (possible since BEAM-3201) to make sense.

The client API would include a {{withUsePartialUpdate(true)}} such as:

{code}
source.apply(
  ElasticsearchIO.write()
    .withConnectionConfiguration(connectionConfiguration)
    .withIdFn(new ExtractValueFn("id"))
    .withUsePartialUpdate(true)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)