You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Roger Hoover (JIRA)" <ji...@apache.org> on 2015/07/23 19:48:04 UTC

[jira] [Updated] (SAMZA-741) Add support for versioning to Elasticsearch System Producer

     [ https://issues.apache.org/jira/browse/SAMZA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roger Hoover updated SAMZA-741:
-------------------------------
    Description: 
Versioning (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) lets you prevent duplicate messages from temporarily overwriting new versions of a document with old ones.

Currently, the Elasticsearch system producer does not support setting versions.  Since Kafka/Samza don't support message metadata besides a key (I think), the best approach seems to be to embed metadata into the stream name.

We can add a version and version_type as options to the stream name.  These match up with Elasticsearch REST API (https://www.elastic.co/blog/elasticsearch-versioning-support)

{noformat}
{index-name}/{type-name}?version={version-id}&version_type={version-type}
{noformat}

  was:
Versioning (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) lets you prevent duplicate messages from temporarily overwriting new versions of a document with old ones.

Currently, the Elasticsearch system producer does not support setting versions.  Since Kafka/Samza don't support message metadata besides a key (I think), the best approach seems to be to embed metadata into the stream name.

We can add a version and version_type as options to the stream name.  These match up with Elasticsearch REST API (https://www.elastic.co/blog/elasticsearch-versioning-support)

{index-name}/{type-name}?version={version-id}&version_type={version-type}


> Add support for versioning to Elasticsearch System Producer
> -----------------------------------------------------------
>
>                 Key: SAMZA-741
>                 URL: https://issues.apache.org/jira/browse/SAMZA-741
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Roger Hoover
>            Priority: Minor
>             Fix For: 0.10.0
>
>
> Versioning (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) lets you prevent duplicate messages from temporarily overwriting new versions of a document with old ones.
> Currently, the Elasticsearch system producer does not support setting versions.  Since Kafka/Samza don't support message metadata besides a key (I think), the best approach seems to be to embed metadata into the stream name.
> We can add a version and version_type as options to the stream name.  These match up with Elasticsearch REST API (https://www.elastic.co/blog/elasticsearch-versioning-support)
> {noformat}
> {index-name}/{type-name}?version={version-id}&version_type={version-type}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)