You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Vilhelm von Ehrenheim <vo...@gmail.com> on 2017/11/15 08:51:54 UTC

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

I second this. In my case i have a lot of stateful updates in my stream so
it doesnt work at all. I ended up using Kafka between to mitigate the issue
but it would be great with an IO connector that writes on keys directly.

I think a PR is always welcome. :)

/ Vilhelm

On 15 Nov 2017 05:16, "Chet Aldrich" <ch...@postmates.com> wrote:

> Hello all!
>
> So I’ve been using the ElasticSearchIO sink for a project (unfortunately
> it’s Elasticsearch 5.x, and so I’ve been messing around with the latest RC)
> and I’m finding that it doesn’t allow for changing the document ID, but
> only lets you pass in a record, which means that the document ID is
> auto-generated. See this line for what specifically is happening:
>
> https://github.com/apache/beam/blob/master/sdks/java/io/
> elasticsearch/src/main/java/org/apache/beam/sdk/io/
> elasticsearch/ElasticsearchIO.java#L838
>
> Essentially the data part of the document is being placed but it doesn’t
> allow for other properties, such as the document ID, to be set.
>
> This leads to two problems:
>
> 1. Beam doesn’t necessarily guarantee exactly-once execution for a given
> item in a PCollection, as I understand it. This means that you may get more
> than one record in Elastic for a given item in a PCollection that you pass
> in.
>
> 2. You can’t do partial updates to an index. If you run a batch job once,
> and then run the batch job again on the same index without clearing it, you
> just double everything in there.
>
> Is there any good way around this?
>
> I’d be happy to try writing up a PR for this in theory, but not sure how
> to best approach it. Also would like to figure out a way to get around this
> in the meantime, if anyone has any ideas.
>
> Best,
>
> Chet
>
> P.S. CCed echauchot@gmail.com because it seems like he’s been doing work
> related to the elastic sink.
>
>
>