You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Shara Shi <sh...@dhgate.com> on 2012/08/28 07:08:57 UTC
答复: 答复: 答复: HDFS SINK Performacne
HI Anchlia ,
If I use hadoop fs �Cput xxx xxx , the performance is ok much faster than
flume��s .
Regards
Shara
������: Mohit Anchlia [mailto:mohitanchlia@gmail.com]
����ʱ��: 2012��8��28�� 12:49
�ռ���: user@flume.apache.org
����: Re: ��: ��: HDFS SINK Performacne
Do you get better performance when you directly write to the cluster? Can
you perform some tests writing to cluster directly and compare?
On Mon, Aug 27, 2012 at 8:19 PM, Shara Shi <sh...@dhgate.com> wrote:
Hi Denny
It is 20MB /min , I confirmed
I sent data from avro-client from local to flume agent , I really got
20MB/min
So I try to find out the reason why.
Regards
Shara
������: Denny Ye [mailto:dennyy99@gmail.com]
����ʱ��: 2012��8��28�� 11:02
�ռ���: user@flume.apache.org
����: Re: ��: HDFS SINK Performacne
20MB/min or 20MB/sec?
I doubt that it may have presentation mistake. Can you confirm it?
-Regards
Denny Ye
2012/8/28 Shara Shi <sh...@dhgate.com>
Hi Denny
The throughput is 45MB/sec is OK for me .
But I just got 20M / Minutes
What��s wrong with my configuration?
Regards
Shara
������: Denny Ye [mailto:dennyy99@gmail.com]
����ʱ��: 2012��8��27�� 20:05
�ռ���: user@flume.apache.org
����: Re: HDFS SINK Performacne
hi Shara,
You are using MemoryChannel as repository. I tested it with outcomes:
45MB/sec without full GC in local updated code. Is this your goal? or more
high throughput?
-Regards
Denny Ye
2012/8/27 Shara Shi <sh...@dhgate.com>
Hi All,
Whatever I have tuned parameters of hdfs sink, It can��t get higher
performance over than 20MB per minutes.
Is that normal? I think it is weird.
How can I improve it
Regards
Ruihong Shi
==========================================
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# Define a memory channel called ch1 on collector1
collector2.channels.ch2.type = memory
collector2.channels.ch2.capacity=500000
collector2.channels.ch2.keep-alive=1
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414 <http://0.0.0.0:41414/> . Connect it to channel
ch1.
collector2.sources.avro-source1.channels = ch2
collector2.sources.avro-source1.type = avro
collector2.sources.avro-source1.bind = 0.0.0.0
collector2.sources.avro-source1.port = 41415
collector2.sources.avro-soruce1.threads = 10
# Define a hdfs sink
collector2.sinks.hdfs.channel = ch2
collector2.sinks.hdfs.type= hdfs
collector2.sinks.hdfs.hdfs.path=hdfs://namenode:8020/user/root/flume/webdata
/exec/%Y/%m/%d/%H
collector2.sinks.hdfs.batchsize=50000
collector2.sinks.hdfs.runner.type=polling
collector2.sinks.hdfs.runner.polling.interval = 1
collector2.sinks.hdfs.hdfs.rollInterval = 120
collector2.sinks.hdfs.hdfs.rollSize =0
collector2.sinks.hdfs.hdfs.rollCount = 300000
collector2.sinks.hdfs.hdfs.fileType=DataStream
collector2.sinks.hdfs.hdfs.round =true
collector2.sinks.hdfs.hdfs.roundValue = 10
collector2.sinks.hdfs.hdfs.roundUnit = minute
collector2.sinks.hdfs.hdfs.threadsPoolSize = 10
collector2.sinks.hdfs.hdfs.rollTimerPoolSize = 10
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.