You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nguyen Manh Tien <ti...@gmail.com> on 2012/08/03 19:33:32 UTC

Number of concurrent writer to HDFS

Hi,
I plan to streaming logs data HDFS using many writer, each writer write a
stream of data to a HDFS file (may rotate)

I wonder how many concurrent writer i should use?
And if you have that experience please share to me : hadoop cluster size,
number of writer, replication.

Thanks.
Tien

Re: Number of concurrent writer to HDFS

Posted by alo alt <wg...@gmail.com>.
With Flume you could use batch mode. Flume will wait until the count of 
events are delivered (let's say 100), and then bulk write them into HDFS 
(as example). On top you could set a timeout, means, if in sec=x you not 
hit batch=x write out. That are usefull for very small files (Avro 
maybe), and will decrease the NN stress.

cheers,
Alex

Nguyen Manh Tien wrote:
>
> You are correct.
> I think the the botleneck maybe in namenode when there are too many
> small file, HDFS is for big file, not for so many small file.

Re: Number of concurrent writer to HDFS

Posted by Nguyen Manh Tien <ti...@gmail.com>.
You are correct.
I think the the botleneck maybe in namenode when there are too many small
file, HDFS is for big file, not for so many small file.

Re: Number of concurrent writer to HDFS

Posted by Yanbo Liang <ya...@gmail.com>.
I think there is no distinct limitation at the number of files one can
write to at the same time. Because each write stream is out to
corresponding DataNodes which are different most likely. So it's like the
MapReduce output directly stored as seperate file in HDFS which is no
distinct limitation at the number of files concurrently write.

2012/8/7 Nguyen Manh Tien <ti...@gmail.com>

> @Yanbo, Alex: I want to dev a custom module to write directly to HDFS.
> Collector in flume aggregate log from many source and write into few file.
> So if i want to write to many file (for example one for each source), i
> want to know how many file we can open in that case.
>
> Thanks.
> Tien
>
>
> On Mon, Aug 6, 2012 at 9:58 PM, Alex Baranau <al...@gmail.com>wrote:
>
>> Also interested in this question.
>>
>> @Yanbo: while we could use third-party tools to import/gather data into
>> HDFS, I guess here is the intention to write data to HDFS directly. It
>> would be great to hear what are the "sensible" limitations on number of
>> files one can write to at the same time.
>>
>> Thank you in advance,
>>
>> Alex Baranau
>> ------
>> Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
>>
>> On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang <ya...@gmail.com> wrote:
>>
>>> You can use scribe or flume to collect log data and integrated with
>>> hadoop.
>>>
>>>
>>> 2012/8/4 Nguyen Manh Tien <ti...@gmail.com>
>>>
>>>> Hi,
>>>> I plan to streaming logs data HDFS using many writer, each writer write
>>>> a stream of data to a HDFS file (may rotate)
>>>>
>>>> I wonder how many concurrent writer i should use?
>>>> And if you have that experience please share to me : hadoop cluster
>>>> size, number of writer, replication.
>>>>
>>>> Thanks.
>>>> Tien
>>>>
>>>
>>>
>>
>>
>> --
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
>> - Solr
>>
>>
>

Re: Number of concurrent writer to HDFS

Posted by Nguyen Manh Tien <ti...@gmail.com>.
@Yanbo, Alex: I want to dev a custom module to write directly to HDFS.
Collector in flume aggregate log from many source and write into few file.
So if i want to write to many file (for example one for each source), i
want to know how many file we can open in that case.

Thanks.
Tien

On Mon, Aug 6, 2012 at 9:58 PM, Alex Baranau <al...@gmail.com>wrote:

> Also interested in this question.
>
> @Yanbo: while we could use third-party tools to import/gather data into
> HDFS, I guess here is the intention to write data to HDFS directly. It
> would be great to hear what are the "sensible" limitations on number of
> files one can write to at the same time.
>
> Thank you in advance,
>
> Alex Baranau
> ------
> Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
>
> On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang <ya...@gmail.com> wrote:
>
>> You can use scribe or flume to collect log data and integrated with
>> hadoop.
>>
>>
>> 2012/8/4 Nguyen Manh Tien <ti...@gmail.com>
>>
>>> Hi,
>>> I plan to streaming logs data HDFS using many writer, each writer write
>>> a stream of data to a HDFS file (may rotate)
>>>
>>> I wonder how many concurrent writer i should use?
>>> And if you have that experience please share to me : hadoop cluster
>>> size, number of writer, replication.
>>>
>>> Thanks.
>>> Tien
>>>
>>
>>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
>

Re: Number of concurrent writer to HDFS

Posted by Alex Baranau <al...@gmail.com>.
Also interested in this question.

@Yanbo: while we could use third-party tools to import/gather data into
HDFS, I guess here is the intention to write data to HDFS directly. It
would be great to hear what are the "sensible" limitations on number of
files one can write to at the same time.

Thank you in advance,

Alex Baranau
------
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang <ya...@gmail.com> wrote:

> You can use scribe or flume to collect log data and integrated with hadoop.
>
>
> 2012/8/4 Nguyen Manh Tien <ti...@gmail.com>
>
>> Hi,
>> I plan to streaming logs data HDFS using many writer, each writer write a
>> stream of data to a HDFS file (may rotate)
>>
>> I wonder how many concurrent writer i should use?
>> And if you have that experience please share to me : hadoop cluster size,
>> number of writer, replication.
>>
>> Thanks.
>> Tien
>>
>
>


-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Number of concurrent writer to HDFS

Posted by Yanbo Liang <ya...@gmail.com>.
You can use scribe or flume to collect log data and integrated with hadoop.

2012/8/4 Nguyen Manh Tien <ti...@gmail.com>

> Hi,
> I plan to streaming logs data HDFS using many writer, each writer write a
> stream of data to a HDFS file (may rotate)
>
> I wonder how many concurrent writer i should use?
> And if you have that experience please share to me : hadoop cluster size,
> number of writer, replication.
>
> Thanks.
> Tien
>