You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flume.apache.org by "Selman Kayrancioglu (JIRA)" <ji...@apache.org> on 2019/02/25 10:04:00 UTC

[jira] [Created] (FLUME-3319) Is there a chance that TAILDIR Source skipping files?

Selman Kayrancioglu created FLUME-3319:
------------------------------------------

             Summary: Is there a chance that TAILDIR Source skipping files?
                 Key: FLUME-3319
                 URL: https://issues.apache.org/jira/browse/FLUME-3319
             Project: Flume
          Issue Type: Question
          Components: Sinks+Sources
    Affects Versions: 1.9.0
         Environment: {{flume-env.sh}}
{code:bash}
export JAVA_OPTS="-Xms100m -Xmx1000m -Dcom.sun.management.jmxremote -Dflume.root.logger=INFO,console -javaagent:/opt/flume/flume/jmx_prometheus_javaagent-0.11.0.jar=5000:/opt/flume/flume/jmx_exporter.yml"
{code}

{{java -version}}

{code}
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
{code}

{{OS}}:
{code:java}
Linux  4.9.127-32.el7.x86_64 #1 SMP Mon Sep 17 13:40:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
{code}


            Reporter: Selman Kayrancioglu


We are using TAILDIR Source + Kafka Sink with following configuration:
{noformat}
tuzla2kafka.sources = tuzla
tuzla2kafka.channels = c1
tuzla2kafka.sinks = kafka

tuzla2kafka.sources.tuzla.type = TAILDIR
tuzla2kafka.sources.tuzla.channels = c1
tuzla2kafka.sources.tuzla.positionFile = /data/flume/positions/tuzla2kafka-taildir_position.json
tuzla2kafka.sources.tuzla.filegroups = tuzla_fluentd
tuzla2kafka.sources.tuzla.filegroups.tuzla_fluentd = /data/tuzla/fluentd/event_log_production.*.log

tuzla2kafka.channels.c1.type = file
tuzla2kafka.channels.c1.checkpointDir = /data/flume/file_channels/c1/checkpoint
tuzla2kafka.channels.c1.dataDirs = /data/flume/file_channels/c1/data
tuzla2kafka.channels.c1.capacity = 1000000

tuzla2kafka.sinks.kafka.type = org.apache.flume.sink.kafka.KafkaSink
tuzla2kafka.sinks.kafka.channel = c1
tuzla2kafka.sinks.kafka.kafka.topic = mini-pipeline
tuzla2kafka.sinks.kafka.kafka.bootstrap.servers = kafka1:9092,kafka2:9092
tuzla2kafka.sinks.kafka.kafka.batchSize = 10000
tuzla2kafka.sinks.kafka.kafka.allowTopicOverride = false
{noformat}

Log files in {{tuzla2kafka.sources.tuzla.filegroups.tuzla_fluentd}} are rotated hourly and each one of them ~1.5GB. We're testing this configuration for 3 days and we noticed that Flume skipped 3 files in 3 days. We were not able to see 'Opening file / Closed file' in Flume logs for these 3 files. Is this a known bug? We're trying to switch from {{fluentd}} to {{flume}} and this behaviour eliminates {{flume}} as an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org