You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Sreekanth Ramakrishnan <sr...@inmobi.com> on 2013/04/04 13:28:40 UTC

Fwd: Max DataXceiver Exceeded Logs in DataNode log files

Posting my previous mail from issues mailing list to dev mailing list.

Hi All,

We are currently running Hadoop 0.20.X version of hadoop cluster in our environment. We have been recently observing slow down of datanodes and DFSClient times out. Looking at the logs in the data nodes we noticed that there were quite a bit of Max DataXceiver exceeded exception messages of following format.

java.io.IOException: xceiverCount 4114 exceeds the limit of concurrent xcievers 4096

Our cluster configuration allows max of 4096 DataXceiver. And due to this exception our dfs clients are getting blocked slowing down DFS Performance from Client prespective.

When JStack of the datanode process was checked, it showed that out of 4166 Active threads in the JVM 1336 threads were of DataXceiver. 2796 threads were PacketResponder threads. Shouldn't DataNode spawn 2760 more DataXceiver before throwing the IOException?

Also looking at the code, it seems that we are not setting different thread group for BlockReceiver which causes the thread pool to be split between BlockReceiver and DataXceiver. Is this intentional?

Are there are any work arounds to see to that max allocation of threads are allocated to DataXceiver?

Or should I go ahead and file a JIRA regarding this issue?

Sreekanth Ramakrishnan

--
_____________________________________________________________
The information contained in this communication is intended solely for the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally privileged
information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify
us immediately by responding to this email and then delete it from your
system. The firm is neither liable for the proper and complete transmission
of the information contained in this communication nor for any delay in its
receipt.

Re: Max DataXceiver Exceeded Logs in DataNode log files

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hello Sreekanth,

The threads that you see named "PacketResponder" are in fact the threads
allocated from the BlockReceiver class.  As you noticed, they are placed
into the same thread group as the threads named "DataXceiver" allocated
from the DataXceiverServer class.  The current count is determined by the
activeCount of the thread group, and the sum of those threads (2796 + 1336)
has exceeded the configured max of 4096.  (Note that
ThreadGroup.activeCount is not precise, so in this example, the total has
gone a bit above 4096.)

This is expected behavior.  The "DataXceiver" and "PacketResponder" threads
are designed to work together as a pair: a receiver and a responder.  There
is no way to configure the limits separately for "DataXceiver" vs.
"PacketResponder".  Since the 2 threads need to cooperate as a pair, I
don't think we'd want to provide the capability to configure the limits
independently either.  (This could potentially cause confusing error
scenarios where the 2 are configured with different values for the limits,
and you have capacity for one kind of thread but not the other.)  The only
way to tune the max thread count is the max xceiver configuration parameter
that you are already using, and you can think of this as the thread pool
size for the whole thing.

I hope this helps.

Thanks,
--Chris

On Thu, Apr 4, 2013 at 4:28 AM, Sreekanth Ramakrishnan <
sreekanth.ramakrishnan@inmobi.com> wrote:

> Posting my previous mail from issues mailing list to dev mailing list.
>
> Hi All,
>
> We are currently running Hadoop 0.20.X version of hadoop cluster in our
> environment. We have been recently observing slow down of datanodes and
> DFSClient times out. Looking at the logs in the data nodes we noticed that
> there were quite a bit of Max DataXceiver exceeded exception messages of
> following format.
>
> java.io.IOException: xceiverCount 4114 exceeds the limit of concurrent
> xcievers 4096
>
> Our cluster configuration allows max of 4096 DataXceiver. And due to this
> exception our  dfs clients are getting blocked slowing down DFS Performance
> from Client prespective.
>
> When JStack of the datanode process was checked, it showed that out of
> 4166 Active threads in the JVM 1336 threads were of DataXceiver. 2796
> threads were PacketResponder threads. Shouldn't DataNode spawn 2760 more
> DataXceiver before throwing the IOException?
>
> Also looking at the code, it seems that we are not setting different
> thread group for BlockReceiver which causes the thread pool to be split
> between BlockReceiver and DataXceiver. Is this intentional?
>
> Are there are any work arounds to see to that max allocation of threads
> are allocated to DataXceiver?
>
> Or should I go ahead and file a JIRA regarding this issue?
>
> Sreekanth Ramakrishnan
>
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>