You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Roger Lloyd <ro...@gmail.com> on 2012/07/15 01:18:36 UTC

A couple of anomalies

I was looking for some insights in regards to a couple of issues I have
seen, and the likely cause/solution.

1)  Tables go blank

So, everything kicking along fine, I am loading data, works beautifully for
days even weeks adding hundreds of millions of entries, splitting tablets,
etc. - just smooth.  Suddenly, I run into an issue where under the web
console all the tables all just go to "-" for their values (except the
!METADATA table).

What could/would cause this?

What is the smart way to react?  Our previous attempts have been 1) re-init
and reload through the Client API and 2) re-init and recover the tables
using the bulk loading scheme mentioned in this mailing list.  Not sure
that we haven't taken more rash action than necessary, simply because we
could afford to reload, etc.  When we increase our deployment, that will be
less of an option.  Not sure what we are doing something wrong overall.

2) Client connections to Zookeeper

When I am writing a client in Eclipse, we seem to have this issue where it
cycles connections creating and closing sessions (with no errors at all),
but if I suspend the thread in Eclipse and start it again, then the session
opens and stays open.  I realize this is probably a Zookeeper problem, but
can someone give me a quick run down of what is happening there under the
hood, so I could try running some zKCli commands to simulate the issue?

We are running version: 1.4.0-incubating-SNAPSHOT and Zookeeper 3.4.3.  If
we wanted to upgrade to 1.4.1, how involved would that be?  Just replace
the jar files and the config files?  Or would we need to migrate data?

Thanks for your help.

Roger

Re: A couple of anomalies

Posted by Eric Newton <er...@gmail.com>.
The monitor program, which runs the jetty server, is independent of
the master.  Your master can be down, and the page will still refresh.

But almost all of the data comes from the master, which is collecting
stats for everything, primarily to support load balancing.

The difference between the pages are really just different summaries
of the data coming from the master.  You can get the same data using
the GetMasterStats class.  There's a thrift RPC call which pulls the
current data.

The JMX data is really a completely separate monitoring interface,
which is not used by the monitor.  But all the same data should be
available.

The benefit of the monitor is that it has been instrumented to
highlight unusual or unexpected values and conditions.  When I'm
providing support to other teams (typically by phone), it's a great
way for getting critical information quickly.

-Eric

On Sat, Jul 14, 2012 at 9:21 PM, Roger Lloyd <ro...@gmail.com> wrote:
> Yeah, so it seems that our number one mistake is taking the Master down in
> response to having issues.  I guess you get so comfortable bringing the
> cluster up and down when you are first starting that it seems like a natural
> knee jerk reaction.  This most recent time there was something in
> yellow/red, but I don't recall what it said and it didn't seem to make sense
> to me, so since I was having problems with the web console and not sure the
> actual state of the Master, I just tried to stop it.  When it pushed back on
> shutting down (running stop-all.sh) something about access denied, I
> cancelled out of the shutdown script - so who knows on where it ended up.
>
> Could you explain a little more about the Master's monitoring console?  It
> runs an embedded Jetty instance and renders data from JMX MBeans from the
> running Master?  I know there is an XML representation, and I thought I saw
> something about embedding it in a separate JMX console (or maybe it is
> blurring with my read on the ZK and Hadoop reading), but is there a data
> store that holds that data, is it accessible by some other means if the web
> console isn't responding?
>
> On Sat, Jul 14, 2012 at 8:09 PM, Eric Newton <er...@gmail.com> wrote:
>>
>> Is there anything red  or yellow on the monitor pages?
>>
>> There's a layering to availability:
>>
>> Most of the monitoring is done via the master, so if it has recently
>> restarted, you will see almost no useful information.
>>
>> The first tablet of the METADATA table needs to be assigned, recovered
>> and functional.  If you see only one tablet assigned... it needs to be
>> healthy before anything else can happen.
>>
>> Next, the rest of the METADATA table needs to be assigned, recovered
>> and functional.
>>
>> If you are seeing "-" then the METADATA table is not available for some
>> reason.
>>
>> Ensure that hadoop & zookeeper are not using /tmp for storage.
>>
>> -Eric
>>
>> On Sat, Jul 14, 2012 at 7:18 PM, Roger Lloyd <ro...@gmail.com>
>> wrote:
>> > I was looking for some insights in regards to a couple of issues I have
>> > seen, and the likely cause/solution.
>> >
>> > 1)  Tables go blank
>> >
>> > So, everything kicking along fine, I am loading data, works beautifully
>> > for
>> > days even weeks adding hundreds of millions of entries, splitting
>> > tablets,
>> > etc. - just smooth.  Suddenly, I run into an issue where under the web
>> > console all the tables all just go to "-" for their values (except the
>> > !METADATA table).
>> >
>> > What could/would cause this?
>> >
>> > What is the smart way to react?  Our previous attempts have been 1)
>> > re-init
>> > and reload through the Client API and 2) re-init and recover the tables
>> > using the bulk loading scheme mentioned in this mailing list.  Not sure
>> > that
>> > we haven't taken more rash action than necessary, simply because we
>> > could
>> > afford to reload, etc.  When we increase our deployment, that will be
>> > less
>> > of an option.  Not sure what we are doing something wrong overall.
>> >
>> > 2) Client connections to Zookeeper
>> >
>> > When I am writing a client in Eclipse, we seem to have this issue where
>> > it
>> > cycles connections creating and closing sessions (with no errors at
>> > all),
>> > but if I suspend the thread in Eclipse and start it again, then the
>> > session
>> > opens and stays open.  I realize this is probably a Zookeeper problem,
>> > but
>> > can someone give me a quick run down of what is happening there under
>> > the
>> > hood, so I could try running some zKCli commands to simulate the issue?
>> >
>> > We are running version: 1.4.0-incubating-SNAPSHOT and Zookeeper 3.4.3.
>> > If
>> > we wanted to upgrade to 1.4.1, how involved would that be?  Just replace
>> > the
>> > jar files and the config files?  Or would we need to migrate data?
>> >
>> > Thanks for your help.
>> >
>> > Roger
>
>

Re: A couple of anomalies

Posted by Roger Lloyd <ro...@gmail.com>.
Yeah, so it seems that our number one mistake is taking the Master down in
response to having issues.  I guess you get so comfortable bringing the
cluster up and down when you are first starting that it seems like a
natural knee jerk reaction.  This most recent time there was something in
yellow/red, but I don't recall what it said and it didn't seem to make
sense to me, so since I was having problems with the web console and not
sure the actual state of the Master, I just tried to stop it.  When it
pushed back on shutting down (running stop-all.sh) something about access
denied, I cancelled out of the shutdown script - so who knows on where it
ended up.

Could you explain a little more about the Master's monitoring console?  It
runs an embedded Jetty instance and renders data from JMX MBeans from the
running Master?  I know there is an XML representation, and I thought I saw
something about embedding it in a separate JMX console (or maybe it is
blurring with my read on the ZK and Hadoop reading), but is there a data
store that holds that data, is it accessible by some other means if the web
console isn't responding?

On Sat, Jul 14, 2012 at 8:09 PM, Eric Newton <er...@gmail.com> wrote:

> Is there anything red  or yellow on the monitor pages?
>
> There's a layering to availability:
>
> Most of the monitoring is done via the master, so if it has recently
> restarted, you will see almost no useful information.
>
> The first tablet of the METADATA table needs to be assigned, recovered
> and functional.  If you see only one tablet assigned... it needs to be
> healthy before anything else can happen.
>
> Next, the rest of the METADATA table needs to be assigned, recovered
> and functional.
>
> If you are seeing "-" then the METADATA table is not available for some
> reason.
>
> Ensure that hadoop & zookeeper are not using /tmp for storage.
>
> -Eric
>
> On Sat, Jul 14, 2012 at 7:18 PM, Roger Lloyd <ro...@gmail.com>
> wrote:
> > I was looking for some insights in regards to a couple of issues I have
> > seen, and the likely cause/solution.
> >
> > 1)  Tables go blank
> >
> > So, everything kicking along fine, I am loading data, works beautifully
> for
> > days even weeks adding hundreds of millions of entries, splitting
> tablets,
> > etc. - just smooth.  Suddenly, I run into an issue where under the web
> > console all the tables all just go to "-" for their values (except the
> > !METADATA table).
> >
> > What could/would cause this?
> >
> > What is the smart way to react?  Our previous attempts have been 1)
> re-init
> > and reload through the Client API and 2) re-init and recover the tables
> > using the bulk loading scheme mentioned in this mailing list.  Not sure
> that
> > we haven't taken more rash action than necessary, simply because we could
> > afford to reload, etc.  When we increase our deployment, that will be
> less
> > of an option.  Not sure what we are doing something wrong overall.
> >
> > 2) Client connections to Zookeeper
> >
> > When I am writing a client in Eclipse, we seem to have this issue where
> it
> > cycles connections creating and closing sessions (with no errors at all),
> > but if I suspend the thread in Eclipse and start it again, then the
> session
> > opens and stays open.  I realize this is probably a Zookeeper problem,
> but
> > can someone give me a quick run down of what is happening there under the
> > hood, so I could try running some zKCli commands to simulate the issue?
> >
> > We are running version: 1.4.0-incubating-SNAPSHOT and Zookeeper 3.4.3.
>  If
> > we wanted to upgrade to 1.4.1, how involved would that be?  Just replace
> the
> > jar files and the config files?  Or would we need to migrate data?
> >
> > Thanks for your help.
> >
> > Roger
>

Re: A couple of anomalies

Posted by Eric Newton <er...@gmail.com>.
Is there anything red  or yellow on the monitor pages?

There's a layering to availability:

Most of the monitoring is done via the master, so if it has recently
restarted, you will see almost no useful information.

The first tablet of the METADATA table needs to be assigned, recovered
and functional.  If you see only one tablet assigned... it needs to be
healthy before anything else can happen.

Next, the rest of the METADATA table needs to be assigned, recovered
and functional.

If you are seeing "-" then the METADATA table is not available for some reason.

Ensure that hadoop & zookeeper are not using /tmp for storage.

-Eric

On Sat, Jul 14, 2012 at 7:18 PM, Roger Lloyd <ro...@gmail.com> wrote:
> I was looking for some insights in regards to a couple of issues I have
> seen, and the likely cause/solution.
>
> 1)  Tables go blank
>
> So, everything kicking along fine, I am loading data, works beautifully for
> days even weeks adding hundreds of millions of entries, splitting tablets,
> etc. - just smooth.  Suddenly, I run into an issue where under the web
> console all the tables all just go to "-" for their values (except the
> !METADATA table).
>
> What could/would cause this?
>
> What is the smart way to react?  Our previous attempts have been 1) re-init
> and reload through the Client API and 2) re-init and recover the tables
> using the bulk loading scheme mentioned in this mailing list.  Not sure that
> we haven't taken more rash action than necessary, simply because we could
> afford to reload, etc.  When we increase our deployment, that will be less
> of an option.  Not sure what we are doing something wrong overall.
>
> 2) Client connections to Zookeeper
>
> When I am writing a client in Eclipse, we seem to have this issue where it
> cycles connections creating and closing sessions (with no errors at all),
> but if I suspend the thread in Eclipse and start it again, then the session
> opens and stays open.  I realize this is probably a Zookeeper problem, but
> can someone give me a quick run down of what is happening there under the
> hood, so I could try running some zKCli commands to simulate the issue?
>
> We are running version: 1.4.0-incubating-SNAPSHOT and Zookeeper 3.4.3.  If
> we wanted to upgrade to 1.4.1, how involved would that be?  Just replace the
> jar files and the config files?  Or would we need to migrate data?
>
> Thanks for your help.
>
> Roger