You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Dave Mullins <pa...@gmail.com> on 2013/11/01 21:17:53 UTC

Accumulo Upgrade from 1.4.2 to 1.5.0 Issues

Hadoop version 0.20.2-cdh3u5
This was installed from the cdh rpms but is not controlled by a cloudera
manager.

I read what documentation I could find on the upgrade.
I installed from the tarball version of 1.5.0.
I made sure to include the commons collection in the accumulo library path.
I made sure to add the dfs.support.append true to the hdfs-site files.
I did a complete restart ( to include a reboot) of the system.

All of the tablet servers come online
all the master's services come online and seem to be working. (The monitor
does show the correct number of tablets, tablet servers, and so forth.)

I am able to use some of the features of the accumulo shell
I can display the contents of a table.
I can't create or delete a table without getting the following error:
[impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to
x.x.x.x:9999:9999 (0) for at least 120040 ms

When I go digging in the logs I find very few errors. (These systems are
not on a net I can cut and paste to here so I am trying to represent the
issue as best I can.)

There are 4 errors that the Repo runner [0-3] threads died

Another error that springs up occasionally is : WARN: Thread "GC" stuck on
io to x.x.x.x:9999:9999 (0) for at least 120040 ms

A netstat run before I start the master up shows nothing running on port
9999 nor any connections to that port.
A netstat after about the accumulo start shows about 16 connections in a
TIME_WAIT state in the 35k-36k port range from the master. It also show an
established state for 1 both both direction (36783) and inbound from port
9999 to port 47636 also from the master.

It seems after this point anything that tries to connect to port 9999 goes
into a TIME_WAIT and never does anything.

I have checked all the permissions I can think of and everything seems to
be correct.
HDFS is running correctly and jobs not associated with accumulo all see to
be working.

Re: Accumulo Upgrade from 1.4.2 to 1.5.0 Issues

Posted by Dave Mullins <pa...@gmail.com>.
It begins to show various txid lines
Then the errors begin.

Thread "org.apache.accumulo.server.fate.Admin" died null
It then goes into a series of errors.
These all stem at the bottom of them from a deserialize error in a section
starting with a "caused by:java.io.EOFException" error.

I will attempt to get a printout of the errors to hand type in if need be.





On Mon, Nov 4, 2013 at 1:51 PM, Eric Newton <er...@gmail.com> wrote:

> These symptoms would appear to be caused by problems with table
> operations, which are heavily dependent on the master being able to
> use data in zookeeper.
>
> So, try to find the first errors, especially those related to
> serialization or deserialization closest to when the master first
> started.
>
> What do you get when you run:
>
> $ ./bin/accumulo org.apache.accumulo.server.fate.Admin print
>
> ?
>
> -Eric
>
>
> On Fri, Nov 1, 2013 at 4:17 PM, Dave Mullins <pa...@gmail.com>
> wrote:
> > Hadoop version 0.20.2-cdh3u5
> > This was installed from the cdh rpms but is not controlled by a cloudera
> > manager.
> >
> > I read what documentation I could find on the upgrade.
> > I installed from the tarball version of 1.5.0.
> > I made sure to include the commons collection in the accumulo library
> path.
> > I made sure to add the dfs.support.append true to the hdfs-site files.
> > I did a complete restart ( to include a reboot) of the system.
> >
> > All of the tablet servers come online
> > all the master's services come online and seem to be working. (The
> monitor
> > does show the correct number of tablets, tablet servers, and so forth.)
> >
> > I am able to use some of the features of the accumulo shell
> > I can display the contents of a table.
> > I can't create or delete a table without getting the following error:
> > [impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to
> > x.x.x.x:9999:9999 (0) for at least 120040 ms
> >
> > When I go digging in the logs I find very few errors. (These systems are
> not
> > on a net I can cut and paste to here so I am trying to represent the
> issue
> > as best I can.)
> >
> > There are 4 errors that the Repo runner [0-3] threads died
> >
> > Another error that springs up occasionally is : WARN: Thread "GC" stuck
> on
> > io to x.x.x.x:9999:9999 (0) for at least 120040 ms
> >
> > A netstat run before I start the master up shows nothing running on port
> > 9999 nor any connections to that port.
> > A netstat after about the accumulo start shows about 16 connections in a
> > TIME_WAIT state in the 35k-36k port range from the master. It also show
> an
> > established state for 1 both both direction (36783) and inbound from port
> > 9999 to port 47636 also from the master.
> >
> > It seems after this point anything that tries to connect to port 9999
> goes
> > into a TIME_WAIT and never does anything.
> >
> > I have checked all the permissions I can think of and everything seems
> to be
> > correct.
> > HDFS is running correctly and jobs not associated with accumulo all see
> to
> > be working.
>

Re: Accumulo Upgrade from 1.4.2 to 1.5.0 Issues

Posted by Eric Newton <er...@gmail.com>.
These symptoms would appear to be caused by problems with table
operations, which are heavily dependent on the master being able to
use data in zookeeper.

So, try to find the first errors, especially those related to
serialization or deserialization closest to when the master first
started.

What do you get when you run:

$ ./bin/accumulo org.apache.accumulo.server.fate.Admin print

?

-Eric


On Fri, Nov 1, 2013 at 4:17 PM, Dave Mullins <pa...@gmail.com> wrote:
> Hadoop version 0.20.2-cdh3u5
> This was installed from the cdh rpms but is not controlled by a cloudera
> manager.
>
> I read what documentation I could find on the upgrade.
> I installed from the tarball version of 1.5.0.
> I made sure to include the commons collection in the accumulo library path.
> I made sure to add the dfs.support.append true to the hdfs-site files.
> I did a complete restart ( to include a reboot) of the system.
>
> All of the tablet servers come online
> all the master's services come online and seem to be working. (The monitor
> does show the correct number of tablets, tablet servers, and so forth.)
>
> I am able to use some of the features of the accumulo shell
> I can display the contents of a table.
> I can't create or delete a table without getting the following error:
> [impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to
> x.x.x.x:9999:9999 (0) for at least 120040 ms
>
> When I go digging in the logs I find very few errors. (These systems are not
> on a net I can cut and paste to here so I am trying to represent the issue
> as best I can.)
>
> There are 4 errors that the Repo runner [0-3] threads died
>
> Another error that springs up occasionally is : WARN: Thread "GC" stuck on
> io to x.x.x.x:9999:9999 (0) for at least 120040 ms
>
> A netstat run before I start the master up shows nothing running on port
> 9999 nor any connections to that port.
> A netstat after about the accumulo start shows about 16 connections in a
> TIME_WAIT state in the 35k-36k port range from the master. It also show an
> established state for 1 both both direction (36783) and inbound from port
> 9999 to port 47636 also from the master.
>
> It seems after this point anything that tries to connect to port 9999 goes
> into a TIME_WAIT and never does anything.
>
> I have checked all the permissions I can think of and everything seems to be
> correct.
> HDFS is running correctly and jobs not associated with accumulo all see to
> be working.