You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Ashish Paliwal (JIRA)" <ji...@apache.org> on 2014/03/12 07:35:49 UTC

[jira] [Resolved] (FLUME-2222) Duplicate entries in Elasticsearch when using Flume elasticsearch-sink

     [ https://issues.apache.org/jira/browse/FLUME-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Paliwal resolved FLUME-2222.
-----------------------------------

       Resolution: Not A Problem
    Fix Version/s: v1.5.0

Not a problem, it's expected behaviour

> Duplicate entries in Elasticsearch when using Flume elasticsearch-sink
> ----------------------------------------------------------------------
>
>                 Key: FLUME-2222
>                 URL: https://issues.apache.org/jira/browse/FLUME-2222
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>         Environment: centos 6
>            Reporter: Nikolaos Tsipas
>            Assignee: Ashish Paliwal
>              Labels: elasticsearch, sink
>             Fix For: v1.5.0
>
>         Attachments: Screen Shot 2013-10-29 at 12.36.01.png
>
>
> Hello,
> I'm using flume elasticsearch-sink to transfer logs from ec2 instances to elasticsearch and I get duplicate entries for numerous documents. 
> I've noticed this issue when I was sending a specific number of log lines to elasticsearch using flume and then I was counting them using kibana to verify that all of them arrived. Most of the time, especially when multiple flume instances were used, I was getting duplicate entries. e.g. instead of receiving 10000 documents from an instance, I was receiving 10060. 
> Duplication level seems to be proportional to the number of instances sending log data simultaneously. e.g. with 3 flume instances I get 10060, with 50 flume instances I get 10300.
> Is duplication something that I should expect when using flume elasticsearch-sink?
> There is a {{doRollback()}} method that is called on transaction failure but I think that it updates only the local flume channel and not elasticsearch.
> Any info/suggestions would be appreciated.
> Regards,
> Nick



--
This message was sent by Atlassian JIRA
(v6.2#6252)