You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Roger Hoover <ro...@gmail.com> on 2015/07/23 19:48:27 UTC

How to map document version to the Elasticsearch System Producer?

Hi Dan and Samza devs,

I have a use case for which I need to set an external version on
Elasticsearch documents.  Versioning (
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning)
lets you prevent duplicate messages from temporarily overwriting new
versions of a document with old ones.

Currently, the Elasticsearch system producer does not support setting
versions.  Since Kafka/Samza don't have support for key/value headers in
messages, I think the best approach is to embed metadata into the stream
name.

We can add a version and version_type as options to the stream name.  These
match up with Elasticsearch REST API (
https://www.elastic.co/blog/elasticsearch-versioning-support)

{index-name}/{type-name}?version={version-id}&version_type={version-type}

I've created a JIRA (https://issues.apache.org/jira/browse/SAMZA-741).  I'd
appreciate your feedback.

Thanks,

Roger