You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Ben Yan <ya...@gmail.com> on 2019/09/19 06:20:42 UTC

Load balancing Sink Processor combined with hdfs sink does not work

The following is the configuration I used. When running, only k2 has data
to hdfs, k1 has no data. Why does this load balancing not take effect?
There is no exception in the running log.


agent.sinkgroups.g1.sinks = k2 k1
agent.sinkgroups.g1.processor.type = load_balance
agent.sinkgroups.g1.processor.backoff = false
agent.sinkgroups.g1.processor.selector = random

# properties of k1
agent.sinks.k1.type = hdfs
agent.sinks.k1.hdfs.rollInterval = 0
agent.sinks.k1.hdfs.rollCount = 5000000
agent.sinks.k1.hdfs.idleTimeout = 0
agent.sinks.k1.hdfs.batchSize = 50000
agent.sinks.k1.hdfs.rollSize = 0
agent.sinks.k1.hdfs.minBlockReplicas = 1
agent.sinks.k1.hdfs.maxOpenFiles = 20000
agent.sinks.k1.hdfs.threadsPoolSize = 5
agent.sinks.k1.hdfs.rollTimerPoolSize = 5
agent.sinks.k1.hdfs.path = hdfs:///data_log_stream/%Y%m%d/%H
agent.sinks.k1.hdfs.filePrefix = realtime.%[IP].k1
agent.sinks.k1.hdfs.fileSuffix = .gz
agent.sinks.k1.hdfs.inUsePrefix = _
agent.sinks.k1.hdfs.inUseSuffix = .inprogress
agent.sinks.k1.hdfs.emptyInUseSuffix = false
agent.sinks.k1.hdfs.codeC = gzip
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.useLocalTimeStamp = false
agent.sinks.k1.hdfs.closeTries = 0
agent.sinks.k1.hdfs.retryInterval = 180
agent.sinks.k1.hdfs.round = true
agent.sinks.k1.hdfs.roundValue = 60
agent.sinks.k1.hdfs.roundUnit = minute
#Specify the channel the sink should use
agent.sinks.k1.channel = file-channel
# properties of hdfs-sink2
agent.sinks.k2.type = hdfs
agent.sinks.k2.hdfs.rollInterval = 0
agent.sinks.k1.hdfs.rollCount = 5000000
agent.sinks.k2.hdfs.idleTimeout = 0
agent.sinks.k2.hdfs.batchSize = 50000
agent.sinks.k2.hdfs.rollSize = 0
agent.sinks.k2.hdfs.minBlockReplicas = 1
agent.sinks.k2.hdfs.maxOpenFiles = 20000
agent.sinks.k2.hdfs.threadsPoolSize = 5
agent.sinks.k2.hdfs.rollTimerPoolSize = 5
agent.sinks.k2.hdfs.path = hdfs:///data_log_stream/%Y%m%d/%H
agent.sinks.k2.hdfs.filePrefix = realtime.%[IP].k2
agent.sinks.k2.hdfs.fileSuffix = .gz
agent.sinks.k2.hdfs.inUsePrefix = _
agent.sinks.k2.hdfs.inUseSuffix = .inprogress
agent.sinks.k2.hdfs.emptyInUseSuffix = false
agent.sinks.k2.hdfs.codeC = gzip
agent.sinks.k2.hdfs.fileType = CompressedStream
agent.sinks.k2.hdfs.useLocalTimeStamp = false
agent.sinks.k2.hdfs.closeTries = 0
agent.sinks.k2.hdfs.retryInterval = 180
agent.sinks.k2.hdfs.round = true
agent.sinks.k2.hdfs.roundValue = 60
agent.sinks.k2.hdfs.roundUnit = minute
#Specify the channel the sink should use
agent.sinks.k2.channel = file-channel

Best,
Ben

Re: Load balancing Sink Processor combined with hdfs sink does not work

Posted by Bessenyei Balázs Donát <be...@apache.org>.
Hi Ben,

Can you please show the full configuration?


Thank you,

Donat


On Thu, 19 Sep 2019 at 08:20, Ben Yan <ya...@gmail.com> wrote:
>
> The following is the configuration I used. When running, only k2 has data to hdfs, k1 has no data. Why does this load balancing not take effect? There is no exception in the running log.
>
>
> agent.sinkgroups.g1.sinks = k2 k1
> agent.sinkgroups.g1.processor.type = load_balance
> agent.sinkgroups.g1.processor.backoff = false
> agent.sinkgroups.g1.processor.selector = random
>
> # properties of k1
> agent.sinks.k1.type = hdfs
> agent.sinks.k1.hdfs.rollInterval = 0
> agent.sinks.k1.hdfs.rollCount = 5000000
> agent.sinks.k1.hdfs.idleTimeout = 0
> agent.sinks.k1.hdfs.batchSize = 50000
> agent.sinks.k1.hdfs.rollSize = 0
> agent.sinks.k1.hdfs.minBlockReplicas = 1
> agent.sinks.k1.hdfs.maxOpenFiles = 20000
> agent.sinks.k1.hdfs.threadsPoolSize = 5
> agent.sinks.k1.hdfs.rollTimerPoolSize = 5
> agent.sinks.k1.hdfs.path = hdfs:///data_log_stream/%Y%m%d/%H
> agent.sinks.k1.hdfs.filePrefix = realtime.%[IP].k1
> agent.sinks.k1.hdfs.fileSuffix = .gz
> agent.sinks.k1.hdfs.inUsePrefix = _
> agent.sinks.k1.hdfs.inUseSuffix = .inprogress
> agent.sinks.k1.hdfs.emptyInUseSuffix = false
> agent.sinks.k1.hdfs.codeC = gzip
> agent.sinks.k1.hdfs.fileType = CompressedStream
> agent.sinks.k1.hdfs.useLocalTimeStamp = false
> agent.sinks.k1.hdfs.closeTries = 0
> agent.sinks.k1.hdfs.retryInterval = 180
> agent.sinks.k1.hdfs.round = true
> agent.sinks.k1.hdfs.roundValue = 60
> agent.sinks.k1.hdfs.roundUnit = minute
> #Specify the channel the sink should use
> agent.sinks.k1.channel = file-channel
> # properties of hdfs-sink2
> agent.sinks.k2.type = hdfs
> agent.sinks.k2.hdfs.rollInterval = 0
> agent.sinks.k1.hdfs.rollCount = 5000000
> agent.sinks.k2.hdfs.idleTimeout = 0
> agent.sinks.k2.hdfs.batchSize = 50000
> agent.sinks.k2.hdfs.rollSize = 0
> agent.sinks.k2.hdfs.minBlockReplicas = 1
> agent.sinks.k2.hdfs.maxOpenFiles = 20000
> agent.sinks.k2.hdfs.threadsPoolSize = 5
> agent.sinks.k2.hdfs.rollTimerPoolSize = 5
> agent.sinks.k2.hdfs.path = hdfs:///data_log_stream/%Y%m%d/%H
> agent.sinks.k2.hdfs.filePrefix = realtime.%[IP].k2
> agent.sinks.k2.hdfs.fileSuffix = .gz
> agent.sinks.k2.hdfs.inUsePrefix = _
> agent.sinks.k2.hdfs.inUseSuffix = .inprogress
> agent.sinks.k2.hdfs.emptyInUseSuffix = false
> agent.sinks.k2.hdfs.codeC = gzip
> agent.sinks.k2.hdfs.fileType = CompressedStream
> agent.sinks.k2.hdfs.useLocalTimeStamp = false
> agent.sinks.k2.hdfs.closeTries = 0
> agent.sinks.k2.hdfs.retryInterval = 180
> agent.sinks.k2.hdfs.round = true
> agent.sinks.k2.hdfs.roundValue = 60
> agent.sinks.k2.hdfs.roundUnit = minute
> #Specify the channel the sink should use
> agent.sinks.k2.channel = file-channel
>
> Best,
> Ben