You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Benson Margulies <bi...@gmail.com> on 2012/02/15 15:40:45 UTC

I put my machine to sleep, and now I've a problem

I have an accumulo install on my laptop. (1.3.5) Yesterday, without
stopping any processes, I closed the cover to put it to sleep.

Today, I came back to find that the master wasn't running. So I did
something fairly dumb, and ran start-all again. Here's what happened.
Trying to run stop-all.sh now gets

15 09:37:35,950 [impl.ThriftTransportPool] WARN : Thread "admin" stuck
on IO  to localhost:9999:9999 (0) for at least 120203 ms

I guess I'll just start firing up kill commands and hope for the best.

15 09:32:15,536 [shell.Shell] ERROR:
org.apache.accumulo.core.client.AccumuloException:
org.apache.thrift.transport.TTransportException: Failed to connect to
a server
█▓▒░benson@tinfoilhat░▒▓██▓▒░ Wed Feb 15 09:32:16P
/opt/accumulo-1.3.5-incubating/ jps
2780 Main
2086 QuorumPeerMain
1990 DataNode
13478 Jps
2072 SecondaryNameNode
1907 NameNode
2894 Main
3014
█▓▒░benson@tinfoilhat░▒▓██▓▒░ Wed Feb 15 09:32:23P
/opt/accumulo-1.3.5-incubating/ bin/start-all.sh
Starting tablet servers and loggers .... done
Starting logger on localhost
Starting tablet server on localhost
2012-02-15 09:32:31.647 java[13558:df03] Unable to load realm info
from SCDynamicStore
15 09:32:31,813 [security.UserGroupInformation] INFO : JAAS
Configuration already set up for Hadoop, not re-installing.
Starting master on localhost
Starting garbage collector on localhost
localhost : monitor already running (2780)
localhost : tracer already running (2894)
█▓▒░benson@tinfoilhat░▒▓██▓▒░ Wed Feb 15 09:32:36P
/opt/accumulo-1.3.5-incubating/ bin/accumulo shell -u root
2012-02-15 09:32:43.356 java[14170:df03] Unable to load realm info
from SCDynamicStore
15 09:32:43,467 [security.UserGroupInformation] INFO : JAAS
Configuration already set up for Hadoop, not re-installing.
Enter current password for 'root'@'default': ******

Shell - Accumulo Interactive Shell
-
- version: 1.3.5-incubating
- instance name: default
- instance id: 0f20de2f-a1b7-47dc-a877-740f901b22db
-
- type 'help' for a list of available commands
-
root@default> droptable entity
15 09:34:52,478 [impl.ThriftTransportPool] WARN : Thread "shell" stuck
on IO  to localhost:9999:9999 (0) for at least 120211 ms

Re: Suspension

Posted by Adam Fuchs <ad...@ugov.gov>.

I think this makes a lot of sense. I use Accumulo enough on a laptop to be
annoyed at how often I have to run start-all.sh.

One way we could do this is to have a separate daemon process restart
accumulo processes anytime they go down. I think log recovery is almost as
efficient as any other way of suspending memory to disk, and it doesn't add
any extra complexity to the code base. The only other concern is having the
daemon restart a process that should actually be down, and we would have to
work out the model for that.

Adam

On Wed, Feb 15, 2012 at 9:54 AM, Aaron Cordova <aa...@cordovas.org> wrote:

> EC2 as well as laptop users would be interested in making Accumulo
> 'suspendable'. The self-monitoring features end up killing off processes
> upon awakening. Perhaps this could be implemented by a simple switch that
> tells Accumulo not to worry about abandoning processes that don't report,
> that can be enabled before suspension and disabled after .. or simply left
> enabled for stand-alone laptop users.
>
> Does it make sense to make it possible to suspend a running Accumulo
> instance, or should this simply be discouraged and made well known?
>
>

Re: Suspension

Posted by John Vines <jo...@ugov.gov>.

On Feb 15, 2012 10:57 AM, "Billie J Rinaldi" <bi...@ugov.gov>
wrote:
>
> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
aaron@cordovas.org> wrote:
> > Such an option would have to be very conspicuous so that users don't
> > accidentally enable it and then wonder why bad tablet servers aren't
> > removed automatically from the cluster.
>
> We could call it laptop.mode.

+1

>
> Billie

Re: Suspension

Posted by Adam Fuchs <ad...@ugov.gov>.

This isn't really just a laptop problem. We also see hiccups in clusters
(admins accidentally the whole network, etc.) that we would want to
automatically recover from. I think having self-restarting processes could
be generally useful.

I think that an option of not using zookeeper timeouts might lead to abuse,
and could be very bad for stability under rare failure modes. We make a lot
of assumptions throughout the code about these timeouts, and we would have
to reconsider a large part of that model.

Adam

On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
billie.j.rinaldi@ugov.gov> wrote:

> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
> aaron@cordovas.org> wrote:
> > Such an option would have to be very conspicuous so that users don't
> > accidentally enable it and then wonder why bad tablet servers aren't
> > removed automatically from the cluster.
>
> We could call it laptop.mode.
>
> Billie
>

Re: Suspension

Posted by Aaron Cordova <aa...@cordovas.org>.

I don't know if a general process-starting service belongs in the Accumulo project .. but, it is cumbersome to run a large distributed service without some such service. Are there existing things out there that come close? I'm sure there are tools that monitor machines for missing processes that can ssh in and restart them .. 

There are systems that are designed according to principles such as "CrashOnlySoftware" and Erlang's "LetItCrash" in which processes are stopped by crashing and startup always involves recovery, which is kind of elegant since you don't have to design non-crash and non-recovery stop and start sequences. However, I think Accumulo is not quite designed that way right now and it might be a lot of work to make it that way and it might not be a good idea anyway.

I also anticipate that making all processes, including ZooKeeper, able to continue operating in the presence of large gaps of time might be a lot of work and might destabilize monitoring and recovery mechanisms already in place. It would only be worth doing if it became clear that it could be done cleanly, and while keeping the standard, non-laptop, and non-VM/non-EC2 mode of operation intact. I value stability over nice-to-have but outside-the-core-use-case features at this point.

On Feb 15, 2012, at 11:38 AM, Joey Echeverria wrote:

> Systems I've used that include automatic restart usually have a limit of restarting 3-4 times in a row, before giving up. It's nice if you can have a time out on that counter so you retain the auto-restart capability if you need to suspend a few days from now.
> 
> I've also worked on a system where process restarts were the way we handled failures. ZooKeeper state can be tricky to recover if you've been down for long enough for your session to expire. I found it easier to just kill the process and go through the full "boot-up" logic. In that system, we used the shell scripts launching the JVMs handle the restart with the restart policy being dictated by exit code.
> 
> -Joey
> 
> On Wed, Feb 15, 2012 at 11:16 AM, John Vines <jo...@ugov.gov> wrote:
> There are too many cases where a node legitimately died and we do not want it constantly coming back and bogging things down. How do you design it to restart the accidentally deaths but not the deserves it deaths?
> 
> On Feb 15, 2012 11:11 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:
> This isn't really just a laptop problem. We also see hiccups in clusters (admins accidentally the whole network, etc.) that we would want to automatically recover from. I think having self-restarting processes could be generally useful.
> 
> I think that an option of not using zookeeper timeouts might lead to abuse, and could be very bad for stability under rare failure modes. We make a lot of assumptions throughout the code about these timeouts, and we would have to reconsider a large part of that model.
> 
> Adam
> 
> 
> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <bi...@ugov.gov> wrote:
> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <aa...@cordovas.org> wrote:
> > Such an option would have to be very conspicuous so that users don't
> > accidentally enable it and then wonder why bad tablet servers aren't
> > removed automatically from the cluster.
> 
> We could call it laptop.mode.
> 
> Billie
> 
> 
> 
> 
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: Suspension

Posted by Joey Echeverria <jo...@cloudera.com>.

Systems I've used that include automatic restart usually have a limit of
restarting 3-4 times in a row, before giving up. It's nice if you can have
a time out on that counter so you retain the auto-restart capability if you
need to suspend a few days from now.

I've also worked on a system where process restarts were the way we handled
failures. ZooKeeper state can be tricky to recover if you've been down for
long enough for your session to expire. I found it easier to just kill the
process and go through the full "boot-up" logic. In that system, we used
the shell scripts launching the JVMs handle the restart with the restart
policy being dictated by exit code.

-Joey

On Wed, Feb 15, 2012 at 11:16 AM, John Vines <jo...@ugov.gov> wrote:

> There are too many cases where a node legitimately died and we do not want
> it constantly coming back and bogging things down. How do you design it to
> restart the accidentally deaths but not the deserves it deaths?
> On Feb 15, 2012 11:11 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:
>
>> This isn't really just a laptop problem. We also see hiccups in clusters
>> (admins accidentally the whole network, etc.) that we would want to
>> automatically recover from. I think having self-restarting processes could
>> be generally useful.
>>
>> I think that an option of not using zookeeper timeouts might lead to
>> abuse, and could be very bad for stability under rare failure modes. We
>> make a lot of assumptions throughout the code about these timeouts, and we
>> would have to reconsider a large part of that model.
>>
>> Adam
>>
>>
>> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
>> billie.j.rinaldi@ugov.gov> wrote:
>>
>>> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
>>> aaron@cordovas.org> wrote:
>>> > Such an option would have to be very conspicuous so that users don't
>>> > accidentally enable it and then wonder why bad tablet servers aren't
>>> > removed automatically from the cluster.
>>>
>>> We could call it laptop.mode.
>>>
>>> Billie
>>>
>>
>>

-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Suspension

Posted by Aaron Cordova <aa...@cordovas.org>.

Yeah, we don't want to let designing a restart service distract us from the suspension discussion.

Issuing a 'suspend' command sounds like a third option.

So far we have:

1) run Accumulo in a mode that ignores long timeouts (perhaps enabled just before suspension)
2) let Accumulo die (no modification to Accumulo) and rely on a to-be-created restart service
3) issue a command to suspend processes before suspending the VM / OS

Perhaps the 'suspend' command just enables ignorance of timeouts, but if you're gonna issue a command, you might as well just issue the 'shutdown' command. 

What's the start-up time like for large clusters now days?

Also, what is the effect of taking all tables offline? 

On Feb 15, 2012, at 12:12 PM, David Medinets wrote:

> It seems like the conversation has wandered away from the main point -
> marking a node as suspended instead of having a monitoring service
> discover that it is non-responsive. Would it possible to issue a
> command-line 'suspend' command. And then a 'resume' command  when the
> user is ready to have the node back in the cluster?

Re: Suspension

Posted by John Vines <jo...@ugov.gov>.

Perhaps we want a suspend option which provides the ZK timeouts one large
skew before it expects normal behavior again?

John

On Wed, Feb 15, 2012 at 12:20 PM, Aaron Cordova <aa...@cordovas.org> wrote:

> Yeah, we don't want to let designing a restart service distract us from
> the suspension discussion.
>
> Issuing a 'suspend' command sounds like a third option.
>
> So far we have:
>
> 1) run Accumulo in a mode that ignores long timeouts (perhaps enabled just
> before suspension)
> 2) let Accumulo die (no modification to Accumulo) and rely on a
> to-be-created restart service
> 3) issue a command to suspend processes before suspending the VM / OS
>
> Perhaps the 'suspend' command just enables ignorance of timeouts, but if
> you're gonna issue a command, you might as well just issue the 'shutdown'
> command.
>
> What's the start-up time like for large clusters now days?
>
> Also, what is the effect of taking all tables offline?
>
> On Feb 15, 2012, at 12:12 PM, David Medinets wrote:
>
> > It seems like the conversation has wandered away from the main point -
> > marking a node as suspended instead of having a monitoring service
> > discover that it is non-responsive. Would it possible to issue a
> > command-line 'suspend' command. And then a 'resume' command  when the
> > user is ready to have the node back in the cluster?
>
>

Re: Suspension

Posted by David Medinets <da...@gmail.com>.

It seems like the conversation has wandered away from the main point -
marking a node as suspended instead of having a monitoring service
discover that it is non-responsive. Would it possible to issue a
command-line 'suspend' command. And then a 'resume' command  when the
user is ready to have the node back in the cluster?

Re: Suspension

Posted by Adam Fuchs <ad...@ugov.gov>.

I think we would start out by enumerating the cases in which processes die
and we want them to stay dead, and then consider the repercussions of
trying to restart them in those cases. What cases can you think of in this
space? Here's my short list:
1. Logger dies due to running out of disk space. Restarting it should be
safe because it checks this condition every time it starts?
2. A node is behaving "wonkily" and we choose to remove it from the
cluster. In a manual override condition we can just kill the restarting
daemon. That would take care of restarting assuming we can log in on that
node. If we can't log in, this could be accomplished through a decommission
list in Zookeeper that the restarter checks before trying to launch.
3. A tablet server or logger gets overburdened and can't keep up with its
load. As long as we wait for the cluster to rebalance, this should lead to
a better balanced cluster.

This is by no means a complete list, so please add to it.

Adam

On Wed, Feb 15, 2012 at 11:16 AM, John Vines <jo...@ugov.gov> wrote:

> There are too many cases where a node legitimately died and we do not want
> it constantly coming back and bogging things down. How do you design it to
> restart the accidentally deaths but not the deserves it deaths?
> On Feb 15, 2012 11:11 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:
>
>> This isn't really just a laptop problem. We also see hiccups in clusters
>> (admins accidentally the whole network, etc.) that we would want to
>> automatically recover from. I think having self-restarting processes could
>> be generally useful.
>>
>> I think that an option of not using zookeeper timeouts might lead to
>> abuse, and could be very bad for stability under rare failure modes. We
>> make a lot of assumptions throughout the code about these timeouts, and we
>> would have to reconsider a large part of that model.
>>
>> Adam
>>
>>
>> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
>> billie.j.rinaldi@ugov.gov> wrote:
>>
>>> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
>>> aaron@cordovas.org> wrote:
>>> > Such an option would have to be very conspicuous so that users don't
>>> > accidentally enable it and then wonder why bad tablet servers aren't
>>> > removed automatically from the cluster.
>>>
>>> We could call it laptop.mode.
>>>
>>> Billie
>>>
>>
>>

Re: Suspension

Posted by John Vines <jo...@ugov.gov>.

There are too many cases where a node legitimately died and we do not want
it constantly coming back and bogging things down. How do you design it to
restart the accidentally deaths but not the deserves it deaths?
On Feb 15, 2012 11:11 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:

> This isn't really just a laptop problem. We also see hiccups in clusters
> (admins accidentally the whole network, etc.) that we would want to
> automatically recover from. I think having self-restarting processes could
> be generally useful.
>
> I think that an option of not using zookeeper timeouts might lead to
> abuse, and could be very bad for stability under rare failure modes. We
> make a lot of assumptions throughout the code about these timeouts, and we
> would have to reconsider a large part of that model.
>
> Adam
>
>
> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
> billie.j.rinaldi@ugov.gov> wrote:
>
>> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
>> aaron@cordovas.org> wrote:
>> > Such an option would have to be very conspicuous so that users don't
>> > accidentally enable it and then wonder why bad tablet servers aren't
>> > removed automatically from the cluster.
>>
>> We could call it laptop.mode.
>>
>> Billie
>>
>
>

Re: Suspension

Posted by Billie J Rinaldi <bi...@ugov.gov>.

On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <aa...@cordovas.org> wrote:
> Such an option would have to be very conspicuous so that users don't
> accidentally enable it and then wonder why bad tablet servers aren't
> removed automatically from the cluster.

We could call it laptop.mode.

Billie

Re: Suspension

Posted by Aaron Cordova <aa...@cordovas.org>.

Such an option would have to be very conspicuous so that users don't accidentally enable it and then wonder why bad tablet servers aren't removed automatically from the cluster.

It would also require some thought to make sure that large gaps in all processes' consciousnesses (5 s's in that word!) don't cause other undesirable effects.


On Feb 15, 2012, at 10:31 AM, John Vines wrote:

> That sounds to hacky. Why not just have a Config option for whether zk timeouts are heeded?
> 
> On Feb 15, 2012 10:26 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:
> I think this makes a lot of sense. I use Accumulo enough on a laptop to be annoyed at how often I have to run start-all.sh.
> 
> One way we could do this is to have a separate daemon process restart accumulo processes anytime they go down. I think log recovery is almost as efficient as any other way of suspending memory to disk, and it doesn't add any extra complexity to the code base. The only other concern is having the daemon restart a process that should actually be down, and we would have to work out the model for that.
> 
> Adam
> 
> 
> On Wed, Feb 15, 2012 at 9:54 AM, Aaron Cordova <aa...@cordovas.org> wrote:
> EC2 as well as laptop users would be interested in making Accumulo 'suspendable'. The self-monitoring features end up killing off processes upon awakening. Perhaps this could be implemented by a simple switch that tells Accumulo not to worry about abandoning processes that don't report, that can be enabled before suspension and disabled after .. or simply left enabled for stand-alone laptop users.
> 
> Does it make sense to make it possible to suspend a running Accumulo instance, or should this simply be discouraged and made well known?
> 
>

Re: Suspension

Posted by John Vines <jo...@ugov.gov>.

That sounds to hacky. Why not just have a Config option for whether zk
timeouts are heeded?
On Feb 15, 2012 10:26 AM, "Adam Fuchs" <ad...@ugov.gov> wrote:

> I think this makes a lot of sense. I use Accumulo enough on a laptop to be
> annoyed at how often I have to run start-all.sh.
>
> One way we could do this is to have a separate daemon process restart
> accumulo processes anytime they go down. I think log recovery is almost as
> efficient as any other way of suspending memory to disk, and it doesn't add
> any extra complexity to the code base. The only other concern is having the
> daemon restart a process that should actually be down, and we would have to
> work out the model for that.
>
> Adam
>
>
> On Wed, Feb 15, 2012 at 9:54 AM, Aaron Cordova <aa...@cordovas.org> wrote:
>
>> EC2 as well as laptop users would be interested in making Accumulo
>> 'suspendable'. The self-monitoring features end up killing off processes
>> upon awakening. Perhaps this could be implemented by a simple switch that
>> tells Accumulo not to worry about abandoning processes that don't report,
>> that can be enabled before suspension and disabled after .. or simply left
>> enabled for stand-alone laptop users.
>>
>> Does it make sense to make it possible to suspend a running Accumulo
>> instance, or should this simply be discouraged and made well known?
>>
>>
>

Suspension

Posted by Aaron Cordova <aa...@cordovas.org>.

EC2 as well as laptop users would be interested in making Accumulo 'suspendable'. The self-monitoring features end up killing off processes upon awakening. Perhaps this could be implemented by a simple switch that tells Accumulo not to worry about abandoning processes that don't report, that can be enabled before suspension and disabled after .. or simply left enabled for stand-alone laptop users.

Does it make sense to make it possible to suspend a running Accumulo instance, or should this simply be discouraged and made well known?

Re: I put my machine to sleep, and now I've a problem

Posted by Eric Newton <er...@gmail.com>.

Anything interesting displayed on the monitor pages?  Do you have a logger
and tablet server running?  Any logged errors?

-Eric

On Wed, Feb 15, 2012 at 9:40 AM, Benson Margulies <bi...@gmail.com>wrote:

> I have an accumulo install on my laptop. (1.3.5) Yesterday, without
> stopping any processes, I closed the cover to put it to sleep.
>
> Today, I came back to find that the master wasn't running. So I did
> something fairly dumb, and ran start-all again. Here's what happened.
> Trying to run stop-all.sh now gets
>
> 15 09:37:35,950 [impl.ThriftTransportPool] WARN : Thread "admin" stuck
> on IO  to localhost:9999:9999 (0) for at least 120203 ms
>
> I guess I'll just start firing up kill commands and hope for the best.
>
> 15 09:32:15,536 [shell.Shell] ERROR:
> org.apache.accumulo.core.client.AccumuloException:
> org.apache.thrift.transport.TTransportException: Failed to connect to
> a server
> █▓▒░benson@tinfoilhat░▒▓██▓▒░ Wed Feb 15 09:32:16P
> /opt/accumulo-1.3.5-incubating/ jps
> 2780 Main
> 2086 QuorumPeerMain
> 1990 DataNode
> 13478 Jps
> 2072 SecondaryNameNode
> 1907 NameNode
> 2894 Main
> 3014
> █▓▒░benson@tinfoilhat░▒▓██▓▒░ Wed Feb 15 09:32:23P
> /opt/accumulo-1.3.5-incubating/ bin/start-all.sh
> Starting tablet servers and loggers .... done
> Starting logger on localhost
> Starting tablet server on localhost
> 2012-02-15 09:32:31.647 java[13558:df03] Unable to load realm info
> from SCDynamicStore
> 15 09:32:31,813 [security.UserGroupInformation] INFO : JAAS
> Configuration already set up for Hadoop, not re-installing.
> Starting master on localhost
> Starting garbage collector on localhost
> localhost : monitor already running (2780)
> localhost : tracer already running (2894)
> █▓▒░benson@tinfoilhat░▒▓██▓▒░ Wed Feb 15 09:32:36P
> /opt/accumulo-1.3.5-incubating/ bin/accumulo shell -u root
> 2012-02-15 09:32:43.356 java[14170:df03] Unable to load realm info
> from SCDynamicStore
> 15 09:32:43,467 [security.UserGroupInformation] INFO : JAAS
> Configuration already set up for Hadoop, not re-installing.
> Enter current password for 'root'@'default': ******
>
> Shell - Accumulo Interactive Shell
> -
> - version: 1.3.5-incubating
> - instance name: default
> - instance id: 0f20de2f-a1b7-47dc-a877-740f901b22db
> -
> - type 'help' for a list of available commands
> -
> root@default> droptable entity
> 15 09:34:52,478 [impl.ThriftTransportPool] WARN : Thread "shell" stuck
> on IO  to localhost:9999:9999 (0) for at least 120211 ms
>

Re: I put my machine to sleep, and now I've a problem

Posted by Billie J Rinaldi <bi...@ugov.gov>.

On Thursday, February 23, 2012 10:12:07 AM, "David Medinets" <da...@gmail.com> wrote:
> Press ^C and you'll get an option to force shutdown in a clean manner.

Yes, this probably would have helped Benson with 1.3.  To clarify, 1.4 did not exhibit the stuck behavior; it started up normally.

Billie

> 
> On Thu, Feb 23, 2012 at 9:53 AM, Billie J Rinaldi
> <bi...@ugov.gov> wrote:
> > On Wednesday, February 15, 2012 9:40:45 AM, "Benson Margulies"
> > <bi...@gmail.com>:
> >> I have an accumulo install on my laptop. (1.3.5) Yesterday, without
> >> stopping any processes, I closed the cover to put it to sleep.
> >>
> >> Today, I came back to find that the master wasn't running. So I did
> >> something fairly dumb, and ran start-all again. Here's what
> >> happened.
> >> Trying to run stop-all.sh now gets
> >>
> >> 15 09:37:35,950 [impl.ThriftTransportPool] WARN : Thread "admin"
> >> stuck
> >> on IO to localhost:9999:9999 (0) for at least 120203 ms
> >
> > I just did the same thing with 1.4.0-SNAPSHOT: closed my laptop
> > yesterday, then came back today and ran start-all. Accumulo appears
> > to be working fine. Perhaps it's already somewhat more robust to
> > suspend?
> >
> > Billie

Re: I put my machine to sleep, and now I've a problem

Posted by David Medinets <da...@gmail.com>.

Press ^C and you'll get an option to force shutdown in a clean manner.

On Thu, Feb 23, 2012 at 9:53 AM, Billie J Rinaldi
<bi...@ugov.gov> wrote:
> On Wednesday, February 15, 2012 9:40:45 AM, "Benson Margulies" <bi...@gmail.com>:
>> I have an accumulo install on my laptop. (1.3.5) Yesterday, without
>> stopping any processes, I closed the cover to put it to sleep.
>>
>> Today, I came back to find that the master wasn't running. So I did
>> something fairly dumb, and ran start-all again. Here's what happened.
>> Trying to run stop-all.sh now gets
>>
>> 15 09:37:35,950 [impl.ThriftTransportPool] WARN : Thread "admin" stuck
>> on IO to localhost:9999:9999 (0) for at least 120203 ms
>
> I just did the same thing with 1.4.0-SNAPSHOT: closed my laptop yesterday, then came back today and ran start-all.  Accumulo appears to be working fine.  Perhaps it's already somewhat more robust to suspend?
>
> Billie

Re: I put my machine to sleep, and now I've a problem

Posted by Billie J Rinaldi <bi...@ugov.gov>.

On Wednesday, February 15, 2012 9:40:45 AM, "Benson Margulies" <bi...@gmail.com>:
> I have an accumulo install on my laptop. (1.3.5) Yesterday, without
> stopping any processes, I closed the cover to put it to sleep.
> 
> Today, I came back to find that the master wasn't running. So I did
> something fairly dumb, and ran start-all again. Here's what happened.
> Trying to run stop-all.sh now gets
> 
> 15 09:37:35,950 [impl.ThriftTransportPool] WARN : Thread "admin" stuck
> on IO to localhost:9999:9999 (0) for at least 120203 ms

I just did the same thing with 1.4.0-SNAPSHOT: closed my laptop yesterday, then came back today and ran start-all.  Accumulo appears to be working fine.  Perhaps it's already somewhat more robust to suspend?

Billie