You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2014/01/26 18:32:37 UTC

HDFS open file limit

I have an application that wants to open a large set of files in HDFS simultaneously.  Are there hard or practical limits to what can be opened at once by a single process?  By the entire cluster in aggregate?
Thanks
John



RE: HDFS open file limit

Posted by John Lilley <jo...@redpoint.net>.
What exception would I expect to get if this limit was exceeded?
john

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Monday, January 27, 2014 8:12 AM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS open file limit


Hi John,

There is a concurrent connections limit on the DNs that's set to a default of 4k max parallel threaded connections for reading or writing blocks. This is also expandable via configuration but usually the default value suffices even for pretty large operations given the replicas help spread read load around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net>> wrote:
I have an application that wants to open a large set of files in HDFS simultaneously.  Are there hard or practical limits to what can be opened at once by a single process?  By the entire cluster in aggregate?
Thanks
John



RE: HDFS open file limit

Posted by John Lilley <jo...@redpoint.net>.
What exception would I expect to get if this limit was exceeded?
john

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Monday, January 27, 2014 8:12 AM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS open file limit


Hi John,

There is a concurrent connections limit on the DNs that's set to a default of 4k max parallel threaded connections for reading or writing blocks. This is also expandable via configuration but usually the default value suffices even for pretty large operations given the replicas help spread read load around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net>> wrote:
I have an application that wants to open a large set of files in HDFS simultaneously.  Are there hard or practical limits to what can be opened at once by a single process?  By the entire cluster in aggregate?
Thanks
John



RE: HDFS open file limit

Posted by John Lilley <jo...@redpoint.net>.
What exception would I expect to get if this limit was exceeded?
john

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Monday, January 27, 2014 8:12 AM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS open file limit


Hi John,

There is a concurrent connections limit on the DNs that's set to a default of 4k max parallel threaded connections for reading or writing blocks. This is also expandable via configuration but usually the default value suffices even for pretty large operations given the replicas help spread read load around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net>> wrote:
I have an application that wants to open a large set of files in HDFS simultaneously.  Are there hard or practical limits to what can be opened at once by a single process?  By the entire cluster in aggregate?
Thanks
John



RE: HDFS open file limit

Posted by John Lilley <jo...@redpoint.net>.
What exception would I expect to get if this limit was exceeded?
john

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Monday, January 27, 2014 8:12 AM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS open file limit


Hi John,

There is a concurrent connections limit on the DNs that's set to a default of 4k max parallel threaded connections for reading or writing blocks. This is also expandable via configuration but usually the default value suffices even for pretty large operations given the replicas help spread read load around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net>> wrote:
I have an application that wants to open a large set of files in HDFS simultaneously.  Are there hard or practical limits to what can be opened at once by a single process?  By the entire cluster in aggregate?
Thanks
John



Re: HDFS open file limit

Posted by Harsh J <ha...@cloudera.com>.
Hi John,

There is a concurrent connections limit on the DNs that's set to a default
of 4k max parallel threaded connections for reading or writing blocks. This
is also expandable via configuration but usually the default value suffices
even for pretty large operations given the replicas help spread read load
around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net> wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by Harsh J <ha...@cloudera.com>.
Hi John,

There is a concurrent connections limit on the DNs that's set to a default
of 4k max parallel threaded connections for reading or writing blocks. This
is also expandable via configuration but usually the default value suffices
even for pretty large operations given the replicas help spread read load
around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net> wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by sudhakara st <su...@gmail.com>.
There is no open file limitation for HDFS. The 'Too many open file'  limit
is for OS file system. Increase *system-wide maximum number of open files,
Per-User/Group/Process file descriptor limits.*


On Mon, Jan 27, 2014 at 1:52 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> At least for each machine, there is the *ulimit *that need to be verified.
>
> Regards
>
> Bertrand
>
> Bertrand Dechoux
>
>
> On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  I have an application that wants to open a large set of files in HDFS
>> simultaneously.  Are there hard or practical limits to what can be opened
>> at once by a single process?  By the entire cluster in aggregate?
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: HDFS open file limit

Posted by sudhakara st <su...@gmail.com>.
There is no open file limitation for HDFS. The 'Too many open file'  limit
is for OS file system. Increase *system-wide maximum number of open files,
Per-User/Group/Process file descriptor limits.*


On Mon, Jan 27, 2014 at 1:52 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> At least for each machine, there is the *ulimit *that need to be verified.
>
> Regards
>
> Bertrand
>
> Bertrand Dechoux
>
>
> On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  I have an application that wants to open a large set of files in HDFS
>> simultaneously.  Are there hard or practical limits to what can be opened
>> at once by a single process?  By the entire cluster in aggregate?
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: HDFS open file limit

Posted by sudhakara st <su...@gmail.com>.
There is no open file limitation for HDFS. The 'Too many open file'  limit
is for OS file system. Increase *system-wide maximum number of open files,
Per-User/Group/Process file descriptor limits.*


On Mon, Jan 27, 2014 at 1:52 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> At least for each machine, there is the *ulimit *that need to be verified.
>
> Regards
>
> Bertrand
>
> Bertrand Dechoux
>
>
> On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  I have an application that wants to open a large set of files in HDFS
>> simultaneously.  Are there hard or practical limits to what can be opened
>> at once by a single process?  By the entire cluster in aggregate?
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: HDFS open file limit

Posted by sudhakara st <su...@gmail.com>.
There is no open file limitation for HDFS. The 'Too many open file'  limit
is for OS file system. Increase *system-wide maximum number of open files,
Per-User/Group/Process file descriptor limits.*


On Mon, Jan 27, 2014 at 1:52 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> At least for each machine, there is the *ulimit *that need to be verified.
>
> Regards
>
> Bertrand
>
> Bertrand Dechoux
>
>
> On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  I have an application that wants to open a large set of files in HDFS
>> simultaneously.  Are there hard or practical limits to what can be opened
>> at once by a single process?  By the entire cluster in aggregate?
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: HDFS open file limit

Posted by Bertrand Dechoux <de...@gmail.com>.
At least for each machine, there is the *ulimit *that need to be verified.

Regards

Bertrand

Bertrand Dechoux


On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by Bertrand Dechoux <de...@gmail.com>.
At least for each machine, there is the *ulimit *that need to be verified.

Regards

Bertrand

Bertrand Dechoux


On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by Harsh J <ha...@cloudera.com>.
Hi John,

There is a concurrent connections limit on the DNs that's set to a default
of 4k max parallel threaded connections for reading or writing blocks. This
is also expandable via configuration but usually the default value suffices
even for pretty large operations given the replicas help spread read load
around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net> wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by Bertrand Dechoux <de...@gmail.com>.
At least for each machine, there is the *ulimit *that need to be verified.

Regards

Bertrand

Bertrand Dechoux


On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by Bertrand Dechoux <de...@gmail.com>.
At least for each machine, there is the *ulimit *that need to be verified.

Regards

Bertrand

Bertrand Dechoux


On Sun, Jan 26, 2014 at 6:32 PM, John Lilley <jo...@redpoint.net>wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>

Re: HDFS open file limit

Posted by Harsh J <ha...@cloudera.com>.
Hi John,

There is a concurrent connections limit on the DNs that's set to a default
of 4k max parallel threaded connections for reading or writing blocks. This
is also expandable via configuration but usually the default value suffices
even for pretty large operations given the replicas help spread read load
around.

Beyond this you will mostly just run into configurable OS limitations.
On Jan 26, 2014 11:03 PM, "John Lilley" <jo...@redpoint.net> wrote:

>  I have an application that wants to open a large set of files in HDFS
> simultaneously.  Are there hard or practical limits to what can be opened
> at once by a single process?  By the entire cluster in aggregate?
>
> Thanks
>
> John
>
>
>
>
>