You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by thomasa <th...@ccri.com> on 2014/05/19 22:21:45 UTC

Losing tservers - Unusually high Last Contact times

Hello all,

I am having issues with tablet servers going down due to poor contact times
(my hypothesis at least). In the past I have had stability success with
smaller clouds (20-40 nodes), but have run into issues with a larger number
of nodes (150+). Each node is a datanode, nodemanger, and tablet server.
There is a master node that is running the hadoop namenode, hadoop resource
manager and accumulo master, monitor, etc. There are three zookeeper nodes.
All nodes are vms. This same setup is used on the smaller, stable clouds as
well. 

I do not believe memory allocation is an issue as I have only given
hadoop/yarn (2.2.0) and accumulo (1.5.1) less than half of the available
memory. The FATAL errors I have seen are:

Lost tablet server lock (resaon = SESSION_EXPIRED), exiting

Lost ability to monitor tablet server lock, exiting

Other than bumping up rpc timeout (which I have done but would rather not do
that and find the root cause of the problem), I have run out of ideas on how
to solve this issue. 

Does anyone have any insight into why I would be seeing such bad response
times between nodes? Are there any configuration parameters I can play with
to fix this?

I realize this is a very general question, so let me know if there is any
information I can provide to help clarify the issue.

Thank you in advance for your time.

Thomas



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950.html
Sent from the Users mailing list archive at Nabble.com.

Re: Losing tservers - Unusually high Last Contact times

Posted by thomasa <th...@ccri.com>.
Josh,

Thanks for the suggestions.

When I referenced changing the number of map tasks above, I was actually
changing yarn.nodemanager.resource.memory-mb in yarn-site.xml.

I think I may just be constrained to a less powerful ingest due to possible
disk sharing :/.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950p10024.html
Sent from the Users mailing list archive at Nabble.com.

Re: Losing tservers - Unusually high Last Contact times

Posted by Josh Elser <jo...@gmail.com>.
On 5/21/14, 12:00 PM, thomasa wrote:
> Increasing the timeout settings helped a little, but when I tried to increase
> the number of map tasks for the workers I ran into instability issues.
>
> After re-reading my original post, I think I left out some important
> details. The type of job I am trying to run is a map reduce ingest that uses
> batch writers to populate an accumulo table. On previous, smaller clouds, I
> have had control of disk allocation and made sure to assign a disk per
> worker to avoid write conflicts. On this larger cloud, the disk management
> is transparent to me, but I believe the physical disks backing the vms are
> seen as one large virtual pool. Write times on the big, unstable cloud are
> very fast, 3-4xtimes that of our smaller clouds, but that is seen when I dd
> a file on just one vm. I think when all 150+ nodes are writing to disk, more
> than one node will try to write to the same physical disk and cause
> problematic iowait% (20-50% at least).

You could always try your `dd` trick across many nodes at once using 
pdsh or pssh. That may be a quick way to confirm your hypothesis.

> So, given my situation, what is the best way to configure accumulo knowing
> that the workers share disks and will have write conflicts? Do I just bump
> resources down for ingest for stability then ramp them up for non-ingest
> jobs?

The simple change you could make would be to just reduce the amount of 
memory available for each NodeManager to use 
(yarn.nodemanager.resource.memory-mb in yarn-site.xml), which in turn, 
would reduce the number of concurrent Containers run by the 
NodeManagers, and ultimately reduce the amount of data being sent to 
Accumulo.

Depending on the data and your ingest process, there may be more you can 
do on each client, but that's getting a bit into the weeds.

>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950p10005.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: Losing tservers - Unusually high Last Contact times

Posted by thomasa <th...@ccri.com>.
Increasing the timeout settings helped a little, but when I tried to increase
the number of map tasks for the workers I ran into instability issues.

After re-reading my original post, I think I left out some important
details. The type of job I am trying to run is a map reduce ingest that uses
batch writers to populate an accumulo table. On previous, smaller clouds, I
have had control of disk allocation and made sure to assign a disk per
worker to avoid write conflicts. On this larger cloud, the disk management
is transparent to me, but I believe the physical disks backing the vms are
seen as one large virtual pool. Write times on the big, unstable cloud are
very fast, 3-4xtimes that of our smaller clouds, but that is seen when I dd
a file on just one vm. I think when all 150+ nodes are writing to disk, more
than one node will try to write to the same physical disk and cause
problematic iowait% (20-50% at least). 

So, given my situation, what is the best way to configure accumulo knowing
that the workers share disks and will have write conflicts? Do I just bump
resources down for ingest for stability then ramp them up for non-ingest
jobs?



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950p10005.html
Sent from the Users mailing list archive at Nabble.com.

Re: Losing tservers - Unusually high Last Contact times

Posted by thomasa <th...@ccri.com>.
These are all very good suggestions, I am going to try and run another job
after changing some settings and monitor the performance. 





--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950p9968.html
Sent from the Users mailing list archive at Nabble.com.

Re: Losing tservers - Unusually high Last Contact times

Posted by Josh Elser <jo...@gmail.com>.
On 5/20/14, 10:21 AM, thomasa wrote:
> I was worried about how many connections would be open on the larger cloud,
> so I significantly reduced the number of YARN process. Side question: does
> each worker node have a connection with every other node?

Are you referring to the YARN processes or Accumulo processes? For YARN, 
I believe the container will primarily be communicating back to the RM 
for MapReduce, but a custom app could be doing anything.

For Accumulo, mostly, a tserver will be only communicating with the 
master. I know this isn't entirely true, though. For examples, tservers 
will communicate with other tservers as a part of bulk-importing.

If they did, my
> guess was that there would be significantly more open connections on a 150+
> node cloud than a 40 node cloud. For that reason, I only have 2 YARN
> processes with 2gb memory each on the larger cloud that is seeing the
> issues. My thought was that each YARN process needs a core, the tablet
> server needs a core, and OS stuff could probably use a core.

Yes, you should most definitely be leaving headroom on a system for the 
operating system. A core and 1G of RAM is probably a good starting 
point, but YMMV.


To increase the zookeeper timeout, you can try this, but it will have 
other implications, such a failure detection/recovery being slower:

In accumulo-site.xml: set instance.zookeeper.timeout equal to something 
like 45s or 60s (default is 30s as Dave mentioned earlier).

In zoo.cfg: set maxSessionTimeout equal to the above, but in 
milliseconds, e.g. 45000 or 60000.

Re: Losing tservers - Unusually high Last Contact times

Posted by Josh Elser <jo...@gmail.com>.
iostat -x 2 sda

The '-x' option will repeatedly print the statistics at the given 
interval (seconds). You can (optionally) also give the name(s) of 
devices for each drive.

So, assuming that your ZK transaction log (specified in 
$ZOOKEEPER_HOME/conf/zoo.cfg under the 'dataDir' key) is mounted on 
/dev/sda, the above command would list statistics every 2 seconds until 
you ctrl-C the process.

On 5/20/14, 10:21 AM, thomasa wrote:
>
> What is the best way to check the iowait times for the ZK transaction log?

Re: Losing tservers - Unusually high Last Contact times

Posted by thomasa <th...@ccri.com>.
Thank you for the responses. The number of cpus has been something I have
considered. The worker nodes only have 4 cpus. The YARN processes are
running on the same nodes as the tablet servers. 

On another cloud with 8 cpus for each worker, we have been able to run 10
YARN processes with 2gb memory each. Even though this configuration thrashes
the workers (I have seen OS loads over 20), the tablet servers stay up.

I was worried about how many connections would be open on the larger cloud,
so I significantly reduced the number of YARN process. Side question: does
each worker node have a connection with every other node? If they did, my
guess was that there would be significantly more open connections on a 150+
node cloud than a 40 node cloud. For that reason, I only have 2 YARN
processes with 2gb memory each on the larger cloud that is seeing the
issues. My thought was that each YARN process needs a core, the tablet
server needs a core, and OS stuff could probably use a core. 

Is there a more elegant way to see if the tablet server is being pushed into
swap or starved of CPU other than just watching top during the YARN job?

I did look into zookeeper loads a little bit, but I would be a little
surprised to see issues there as the zookeeper nodes on the big cloud (1cpu,
8gb ram) have significantly more ram than the zookeepers on the smaller
cloud (1cpu, 1gb ram). I did up the GC memory limit for Accumulo gc as I was
seeing issues there early on.

What is the best way to check the iowait times for the ZK transaction log?



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950p9962.html
Sent from the Users mailing list archive at Nabble.com.

Re: Losing tservers - Unusually high Last Contact times

Posted by Sean Busbey <bu...@cloudera.com>.
Another thing to check on the zookeeper servers is the iowait times for
whatever virtual disk the ZK transaction log is using.

-- 
Sean
On May 19, 2014 8:20 PM, "Keith Turner" <ke...@deenlo.com> wrote:

>
>
>
> On Mon, May 19, 2014 at 6:56 PM, <dl...@comcast.net> wrote:
>
>> You are hitting the zookeeper timeout, default 30s I believe. You said you
>> are not oversubscribed for memory, but what about CPU? Are you running
>> YARN
>> processes on the same nodes as the tablet servers? Is the tablet server
>> being pushed into swap or starved of CPU?
>>
>
> Also check on the zookeeper server nodes.  Is Java GC pausing tservers or
> zookeeper servers?
>
>
>>
>> -----Original Message-----
>> From: thomasa [mailto:thomas@ccri.com]
>> Sent: Monday, May 19, 2014 4:22 PM
>> To: user@accumulo.apache.org
>> Subject: Losing tservers - Unusually high Last Contact times
>>
>> Hello all,
>>
>> I am having issues with tablet servers going down due to poor contact
>> times
>> (my hypothesis at least). In the past I have had stability success with
>> smaller clouds (20-40 nodes), but have run into issues with a larger
>> number
>> of nodes (150+). Each node is a datanode, nodemanger, and tablet server.
>> There is a master node that is running the hadoop namenode, hadoop
>> resource
>> manager and accumulo master, monitor, etc. There are three zookeeper
>> nodes.
>> All nodes are vms. This same setup is used on the smaller, stable clouds
>> as
>> well.
>>
>> I do not believe memory allocation is an issue as I have only given
>> hadoop/yarn (2.2.0) and accumulo (1.5.1) less than half of the available
>> memory. The FATAL errors I have seen are:
>>
>> Lost tablet server lock (resaon = SESSION_EXPIRED), exiting
>>
>> Lost ability to monitor tablet server lock, exiting
>>
>> Other than bumping up rpc timeout (which I have done but would rather not
>> do
>> that and find the root cause of the problem), I have run out of ideas on
>> how
>> to solve this issue.
>>
>> Does anyone have any insight into why I would be seeing such bad response
>> times between nodes? Are there any configuration parameters I can play
>> with
>> to fix this?
>>
>> I realize this is a very general question, so let me know if there is any
>> information I can provide to help clarify the issue.
>>
>> Thank you in advance for your time.
>>
>> Thomas
>>
>>
>>
>> --
>> View this message in context:
>>
>> http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-
>> Last-Contact-times-tp9950.html<http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950.html>
>> Sent from the Users mailing list archive at Nabble.com.
>>
>>
>

Re: Losing tservers - Unusually high Last Contact times

Posted by Keith Turner <ke...@deenlo.com>.
On Mon, May 19, 2014 at 6:56 PM, <dl...@comcast.net> wrote:

> You are hitting the zookeeper timeout, default 30s I believe. You said you
> are not oversubscribed for memory, but what about CPU? Are you running YARN
> processes on the same nodes as the tablet servers? Is the tablet server
> being pushed into swap or starved of CPU?
>

Also check on the zookeeper server nodes.  Is Java GC pausing tservers or
zookeeper servers?


>
> -----Original Message-----
> From: thomasa [mailto:thomas@ccri.com]
> Sent: Monday, May 19, 2014 4:22 PM
> To: user@accumulo.apache.org
> Subject: Losing tservers - Unusually high Last Contact times
>
> Hello all,
>
> I am having issues with tablet servers going down due to poor contact times
> (my hypothesis at least). In the past I have had stability success with
> smaller clouds (20-40 nodes), but have run into issues with a larger number
> of nodes (150+). Each node is a datanode, nodemanger, and tablet server.
> There is a master node that is running the hadoop namenode, hadoop resource
> manager and accumulo master, monitor, etc. There are three zookeeper nodes.
> All nodes are vms. This same setup is used on the smaller, stable clouds as
> well.
>
> I do not believe memory allocation is an issue as I have only given
> hadoop/yarn (2.2.0) and accumulo (1.5.1) less than half of the available
> memory. The FATAL errors I have seen are:
>
> Lost tablet server lock (resaon = SESSION_EXPIRED), exiting
>
> Lost ability to monitor tablet server lock, exiting
>
> Other than bumping up rpc timeout (which I have done but would rather not
> do
> that and find the root cause of the problem), I have run out of ideas on
> how
> to solve this issue.
>
> Does anyone have any insight into why I would be seeing such bad response
> times between nodes? Are there any configuration parameters I can play with
> to fix this?
>
> I realize this is a very general question, so let me know if there is any
> information I can provide to help clarify the issue.
>
> Thank you in advance for your time.
>
> Thomas
>
>
>
> --
> View this message in context:
>
> http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-
> Last-Contact-times-tp9950.html
> Sent from the Users mailing list archive at Nabble.com.
>
>

RE: Losing tservers - Unusually high Last Contact times

Posted by dl...@comcast.net.
You are hitting the zookeeper timeout, default 30s I believe. You said you
are not oversubscribed for memory, but what about CPU? Are you running YARN
processes on the same nodes as the tablet servers? Is the tablet server
being pushed into swap or starved of CPU?

-----Original Message-----
From: thomasa [mailto:thomas@ccri.com] 
Sent: Monday, May 19, 2014 4:22 PM
To: user@accumulo.apache.org
Subject: Losing tservers - Unusually high Last Contact times

Hello all,

I am having issues with tablet servers going down due to poor contact times
(my hypothesis at least). In the past I have had stability success with
smaller clouds (20-40 nodes), but have run into issues with a larger number
of nodes (150+). Each node is a datanode, nodemanger, and tablet server.
There is a master node that is running the hadoop namenode, hadoop resource
manager and accumulo master, monitor, etc. There are three zookeeper nodes.
All nodes are vms. This same setup is used on the smaller, stable clouds as
well. 

I do not believe memory allocation is an issue as I have only given
hadoop/yarn (2.2.0) and accumulo (1.5.1) less than half of the available
memory. The FATAL errors I have seen are:

Lost tablet server lock (resaon = SESSION_EXPIRED), exiting

Lost ability to monitor tablet server lock, exiting

Other than bumping up rpc timeout (which I have done but would rather not do
that and find the root cause of the problem), I have run out of ideas on how
to solve this issue. 

Does anyone have any insight into why I would be seeing such bad response
times between nodes? Are there any configuration parameters I can play with
to fix this?

I realize this is a very general question, so let me know if there is any
information I can provide to help clarify the issue.

Thank you in advance for your time.

Thomas



--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-
Last-Contact-times-tp9950.html
Sent from the Users mailing list archive at Nabble.com.