You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ana Gillan <an...@gmail.com> on 2014/08/12 00:17:18 UTC

ulimit for Hive

Hi,

I¹ve been reading a lot of posts about needing to set a high ulimit for file
descriptors in Hadoop and I think it¹s probably the cause of a lot of the
errors I¹ve been having when trying to run queries on larger data sets in
Hive. However, I¹m really confused about how and where to set the limit, so
I have a number of questions:
1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one needs to
be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive query
with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in the
cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?
Thanks in advance,
Ana



Re: ulimit for Hive

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Hi Zhijie,

ulimit is common between hard and soft ulimit

The hard limit can only be set by a sys admin. It can be used for a fork
bomb dos attack.
The sys admin hard ulimit can be set per user i.e hadoop_user

A user can add a line to their .profile file setting a soft -ulimit up to
the hard limit. You can google how to do that

You can check the ulimits like so:

ulimit -H -a // hard limit
ulimit -S -a // soft limit

The max value for the hard limit is -unlimited. I currently have mine set
to this as I was running out of processes (nproc)

I don¹t know about restarting, I think so.
I don¹t know about hive.



Warm regards.

Chris

telephone: 0131 332 6967
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://twitter.com/#!/MacKenzieStudio>
<http://www.linkedin.com/in/chrismackenziephotography/>




From:  Zhijie Shen <zs...@hortonworks.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Tuesday, 12 August 2014 18:33
To:  <us...@hadoop.apache.org>, <us...@hive.apache.org>
Subject:  Re: ulimit for Hive


+ Hive user mailing list
It should be a better place for your questions.



On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

Hi,

I¹ve been reading a lot of posts about needing to set a high ulimit for
file descriptors in Hadoop and I think it¹s probably the cause of a lot of
the errors I¹ve been having when trying to run queries on larger data sets
in Hive. However, I¹m really confused about how and where to set the
limit, so I have a number of questions:

1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one needs to
be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive query
with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in the
cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?

Thanks in advance,
Ana







-- 
Zhijie ShenHortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately and delete it from your system. Thank You.



Re: ulimit for Hive

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Hi Zhijie,

ulimit is common between hard and soft ulimit

The hard limit can only be set by a sys admin. It can be used for a fork
bomb dos attack.
The sys admin hard ulimit can be set per user i.e hadoop_user

A user can add a line to their .profile file setting a soft -ulimit up to
the hard limit. You can google how to do that

You can check the ulimits like so:

ulimit -H -a // hard limit
ulimit -S -a // soft limit

The max value for the hard limit is -unlimited. I currently have mine set
to this as I was running out of processes (nproc)

I don¹t know about restarting, I think so.
I don¹t know about hive.



Warm regards.

Chris

telephone: 0131 332 6967
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://twitter.com/#!/MacKenzieStudio>
<http://www.linkedin.com/in/chrismackenziephotography/>




From:  Zhijie Shen <zs...@hortonworks.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Tuesday, 12 August 2014 18:33
To:  <us...@hadoop.apache.org>, <us...@hive.apache.org>
Subject:  Re: ulimit for Hive


+ Hive user mailing list
It should be a better place for your questions.



On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

Hi,

I¹ve been reading a lot of posts about needing to set a high ulimit for
file descriptors in Hadoop and I think it¹s probably the cause of a lot of
the errors I¹ve been having when trying to run queries on larger data sets
in Hive. However, I¹m really confused about how and where to set the
limit, so I have a number of questions:

1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one needs to
be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive query
with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in the
cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?

Thanks in advance,
Ana







-- 
Zhijie ShenHortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately and delete it from your system. Thank You.



Re: ulimit for Hive

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Hi Zhijie,

ulimit is common between hard and soft ulimit

The hard limit can only be set by a sys admin. It can be used for a fork
bomb dos attack.
The sys admin hard ulimit can be set per user i.e hadoop_user

A user can add a line to their .profile file setting a soft -ulimit up to
the hard limit. You can google how to do that

You can check the ulimits like so:

ulimit -H -a // hard limit
ulimit -S -a // soft limit

The max value for the hard limit is -unlimited. I currently have mine set
to this as I was running out of processes (nproc)

I don¹t know about restarting, I think so.
I don¹t know about hive.



Warm regards.

Chris

telephone: 0131 332 6967
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://twitter.com/#!/MacKenzieStudio>
<http://www.linkedin.com/in/chrismackenziephotography/>




From:  Zhijie Shen <zs...@hortonworks.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Tuesday, 12 August 2014 18:33
To:  <us...@hadoop.apache.org>, <us...@hive.apache.org>
Subject:  Re: ulimit for Hive


+ Hive user mailing list
It should be a better place for your questions.



On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

Hi,

I¹ve been reading a lot of posts about needing to set a high ulimit for
file descriptors in Hadoop and I think it¹s probably the cause of a lot of
the errors I¹ve been having when trying to run queries on larger data sets
in Hive. However, I¹m really confused about how and where to set the
limit, so I have a number of questions:

1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one needs to
be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive query
with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in the
cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?

Thanks in advance,
Ana







-- 
Zhijie ShenHortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately and delete it from your system. Thank You.



Re: ulimit for Hive

Posted by Chris MacKenzie <st...@chrismackenziephotography.co.uk>.
Hi Zhijie,

ulimit is common between hard and soft ulimit

The hard limit can only be set by a sys admin. It can be used for a fork
bomb dos attack.
The sys admin hard ulimit can be set per user i.e hadoop_user

A user can add a line to their .profile file setting a soft -ulimit up to
the hard limit. You can google how to do that

You can check the ulimits like so:

ulimit -H -a // hard limit
ulimit -S -a // soft limit

The max value for the hard limit is -unlimited. I currently have mine set
to this as I was running out of processes (nproc)

I don¹t know about restarting, I think so.
I don¹t know about hive.



Warm regards.

Chris

telephone: 0131 332 6967
email: studio@chrismackenziephotography.co.uk
corporate: www.chrismackenziephotography.co.uk
<http://twitter.com/#!/MacKenzieStudio>
<http://www.linkedin.com/in/chrismackenziephotography/>




From:  Zhijie Shen <zs...@hortonworks.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Tuesday, 12 August 2014 18:33
To:  <us...@hadoop.apache.org>, <us...@hive.apache.org>
Subject:  Re: ulimit for Hive


+ Hive user mailing list
It should be a better place for your questions.



On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

Hi,

I¹ve been reading a lot of posts about needing to set a high ulimit for
file descriptors in Hadoop and I think it¹s probably the cause of a lot of
the errors I¹ve been having when trying to run queries on larger data sets
in Hive. However, I¹m really confused about how and where to set the
limit, so I have a number of questions:

1. How high is it recommended to set the ulimit?
2. What is the difference between soft and hard limits? Which one needs to
be set to the value from question 1?
3. For which user(s) do I set the ulimit? If I am running the Hive query
with my login, do I set my own ulimit to the high value?
4. Do I need to set this limit for these users on all the machines in the
cluster? (we have one master node and 6 slave nodes)
5. Do I need to restart anything after configuring the ulimit?

Thanks in advance,
Ana







-- 
Zhijie ShenHortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately and delete it from your system. Thank You.



Re: ulimit for Hive

Posted by Zhijie Shen <zs...@hortonworks.com>.
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the errors I’ve been having when trying to run queries on larger data sets
> in Hive. However, I’m really confused about how and where to set the limit,
> so I have a number of questions:
>
>    1. How high is it recommended to set the ulimit?
>    2. What is the difference between soft and hard limits? Which one
>    needs to be set to the value from question 1?
>    3. For which user(s) do I set the ulimit? If I am running the Hive
>    query with my login, do I set my own ulimit to the high value?
>    4. Do I need to set this limit for these users on all the machines in
>    the cluster? (we have one master node and 6 slave nodes)
>    5. Do I need to restart anything after configuring the ulimit?
>
> Thanks in advance,
> Ana
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ulimit for Hive

Posted by Zhijie Shen <zs...@hortonworks.com>.
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the errors I’ve been having when trying to run queries on larger data sets
> in Hive. However, I’m really confused about how and where to set the limit,
> so I have a number of questions:
>
>    1. How high is it recommended to set the ulimit?
>    2. What is the difference between soft and hard limits? Which one
>    needs to be set to the value from question 1?
>    3. For which user(s) do I set the ulimit? If I am running the Hive
>    query with my login, do I set my own ulimit to the high value?
>    4. Do I need to set this limit for these users on all the machines in
>    the cluster? (we have one master node and 6 slave nodes)
>    5. Do I need to restart anything after configuring the ulimit?
>
> Thanks in advance,
> Ana
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ulimit for Hive

Posted by Zhijie Shen <zs...@hortonworks.com>.
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the errors I’ve been having when trying to run queries on larger data sets
> in Hive. However, I’m really confused about how and where to set the limit,
> so I have a number of questions:
>
>    1. How high is it recommended to set the ulimit?
>    2. What is the difference between soft and hard limits? Which one
>    needs to be set to the value from question 1?
>    3. For which user(s) do I set the ulimit? If I am running the Hive
>    query with my login, do I set my own ulimit to the high value?
>    4. Do I need to set this limit for these users on all the machines in
>    the cluster? (we have one master node and 6 slave nodes)
>    5. Do I need to restart anything after configuring the ulimit?
>
> Thanks in advance,
> Ana
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ulimit for Hive

Posted by Zhijie Shen <zs...@hortonworks.com>.
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the errors I’ve been having when trying to run queries on larger data sets
> in Hive. However, I’m really confused about how and where to set the limit,
> so I have a number of questions:
>
>    1. How high is it recommended to set the ulimit?
>    2. What is the difference between soft and hard limits? Which one
>    needs to be set to the value from question 1?
>    3. For which user(s) do I set the ulimit? If I am running the Hive
>    query with my login, do I set my own ulimit to the high value?
>    4. Do I need to set this limit for these users on all the machines in
>    the cluster? (we have one master node and 6 slave nodes)
>    5. Do I need to restart anything after configuring the ulimit?
>
> Thanks in advance,
> Ana
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: ulimit for Hive

Posted by Zhijie Shen <zs...@hortonworks.com>.
+ Hive user mailing list

It should be a better place for your questions.


On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan <an...@gmail.com> wrote:

> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the errors I’ve been having when trying to run queries on larger data sets
> in Hive. However, I’m really confused about how and where to set the limit,
> so I have a number of questions:
>
>    1. How high is it recommended to set the ulimit?
>    2. What is the difference between soft and hard limits? Which one
>    needs to be set to the value from question 1?
>    3. For which user(s) do I set the ulimit? If I am running the Hive
>    query with my login, do I set my own ulimit to the high value?
>    4. Do I need to set this limit for these users on all the machines in
>    the cluster? (we have one master node and 6 slave nodes)
>    5. Do I need to restart anything after configuring the ulimit?
>
> Thanks in advance,
> Ana
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.