You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Abhishek Girish (JIRA)" <ji...@apache.org> on 2015/04/30 07:07:06 UTC
[jira] [Updated] (DRILL-2917) Drillbit process fails to restart
with address-already-in-use error due to unclean shutdown
[ https://issues.apache.org/jira/browse/DRILL-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Girish updated DRILL-2917:
-----------------------------------
Description:
On a 4 node cluster, some Drillbits fails to come up, complaining about address already in use.
Previous drill-bit process (if any) was not listed as running via `jps`. The Web UI continued to list all processes to be up.
{code}
# jps
<No Drillbit Process>
# /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh stop
no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
# /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh restart
no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
starting drillbit, logging to /opt/mapr/drill/drill-0.9.0/logs/drillbit.out
# jps
<No Drillbit Process>
{code}
Drillbit.out:
{code}
Exception in thread "main" org.apache.drill.exec.exception.DrillbitStartupException: Failure during initial startup of Drillbit.
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:87)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:66)
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:166)
Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Could not bind Drillbit
at org.apache.drill.exec.rpc.BasicServer.bind(BasicServer.java:158)
at org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:65)
at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:241)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:84)
... 2 more
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
...
...
{code}
It turns out the drill-bit failed to shutdown correctly and an internal process was still running.
{code}
# ps -ef |grep drill
mapr 2807 1 0 Apr25 ? 00:00:00 bash /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh internal_start drillbit
mapr 2862 2807 0 Apr25 ? 00:18:54 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/bin/java -Dlog.path=/opt/mapr/drill/drill-0.9.0/log/drillbit.log -Xms1G -Xmx16G -XX:MaxDirectMemorySize=48G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -ea -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.client=false -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -cp /opt/mapr/drill/drill-0.9.0/conf:/opt/mapr/drill/drill-0.9.0/jars/*:/opt/mapr/drill/drill-0.9.0/jars/ext/*:/opt/mapr/drill/drill-0.9.0/jars/3rdparty/*:/opt/mapr/drill/drill-0.9.0/jars/classb/* org.apache.drill.exec.server.Drillbit
{code}
Killing this process helped bring up drill-bits on all nodes.
was:
ON a 4 node cluster, some Drillbits fails to come up, complaining about address already in use.
Previous drill-bit process (if any) was not listed as running via `jps`. The Web UI continued to list all processes to be up.
{code}
# jps
<No Drillbit Process>
# /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh stop
no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
# /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh restart
no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
starting drillbit, logging to /opt/mapr/drill/drill-0.9.0/logs/drillbit.out
# jps
<No Drillbit Process>
{code}
Drillbit.out:
{code}
Exception in thread "main" org.apache.drill.exec.exception.DrillbitStartupException: Failure during initial startup of Drillbit.
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:87)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:66)
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:166)
Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Could not bind Drillbit
at org.apache.drill.exec.rpc.BasicServer.bind(BasicServer.java:158)
at org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:65)
at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:241)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:84)
... 2 more
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
...
...
{code}
It turns out the drill-bit failed to shutdown correctly and an internal process was still running.
{code}
# ps -ef |grep drill
mapr 2807 1 0 Apr25 ? 00:00:00 bash /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh internal_start drillbit
mapr 2862 2807 0 Apr25 ? 00:18:54 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/bin/java -Dlog.path=/opt/mapr/drill/drill-0.9.0/log/drillbit.log -Xms1G -Xmx16G -XX:MaxDirectMemorySize=48G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -ea -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.client=false -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -cp /opt/mapr/drill/drill-0.9.0/conf:/opt/mapr/drill/drill-0.9.0/jars/*:/opt/mapr/drill/drill-0.9.0/jars/ext/*:/opt/mapr/drill/drill-0.9.0/jars/3rdparty/*:/opt/mapr/drill/drill-0.9.0/jars/classb/* org.apache.drill.exec.server.Drillbit
{code}
Killing this process helped bring up drill-bits on all nodes.
> Drillbit process fails to restart with address-already-in-use error due to unclean shutdown
> -------------------------------------------------------------------------------------------
>
> Key: DRILL-2917
> URL: https://issues.apache.org/jira/browse/DRILL-2917
> Project: Apache Drill
> Issue Type: Bug
> Components: Client - CLI
> Affects Versions: 0.9.0
> Reporter: Abhishek Girish
> Assignee: Daniel Barclay (Drill)
>
> On a 4 node cluster, some Drillbits fails to come up, complaining about address already in use.
> Previous drill-bit process (if any) was not listed as running via `jps`. The Web UI continued to list all processes to be up.
> {code}
> # jps
> <No Drillbit Process>
> # /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh stop
> no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
> # /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh restart
> no drillbit to stop because no pid file /opt/mapr/drill/drill-0.9.0/drillbit.pid
> starting drillbit, logging to /opt/mapr/drill/drill-0.9.0/logs/drillbit.out
> # jps
> <No Drillbit Process>
> {code}
> Drillbit.out:
> {code}
> Exception in thread "main" org.apache.drill.exec.exception.DrillbitStartupException: Failure during initial startup of Drillbit.
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:87)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:66)
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:166)
> Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Could not bind Drillbit
> at org.apache.drill.exec.rpc.BasicServer.bind(BasicServer.java:158)
> at org.apache.drill.exec.service.ServiceEngine.start(ServiceEngine.java:65)
> at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:241)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:84)
> ... 2 more
> Caused by: java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> ...
> ...
> {code}
> It turns out the drill-bit failed to shutdown correctly and an internal process was still running.
> {code}
> # ps -ef |grep drill
> mapr 2807 1 0 Apr25 ? 00:00:00 bash /opt/mapr/drill/drill-0.9.0/bin/drillbit.sh internal_start drillbit
> mapr 2862 2807 0 Apr25 ? 00:18:54 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/bin/java -Dlog.path=/opt/mapr/drill/drill-0.9.0/log/drillbit.log -Xms1G -Xmx16G -XX:MaxDirectMemorySize=48G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=1G -ea -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.client=false -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -cp /opt/mapr/drill/drill-0.9.0/conf:/opt/mapr/drill/drill-0.9.0/jars/*:/opt/mapr/drill/drill-0.9.0/jars/ext/*:/opt/mapr/drill/drill-0.9.0/jars/3rdparty/*:/opt/mapr/drill/drill-0.9.0/jars/classb/* org.apache.drill.exec.server.Drillbit
> {code}
> Killing this process helped bring up drill-bits on all nodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)