You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martijn Koster (JIRA)" <ji...@apache.org> on 2018/12/21 12:53:00 UTC
[jira] [Created] (SOLR-13089) bin/solr's use of lsof has some
issues
Martijn Koster created SOLR-13089:
-------------------------------------
Summary: bin/solr's use of lsof has some issues
Key: SOLR-13089
URL: https://issues.apache.org/jira/browse/SOLR-13089
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCLI
Reporter: Martijn Koster
The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr port is being listened on:
{noformat}
running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
if [ -z "$running" ]; then
{noformat}
code is at [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
There are a few issues with this.
h2. 1. False negatives when port is occupied by different user
When {{lsof}} runs as non-root, it only shows sockets for processes with your effective uid.
For example:
{noformat}
$ id -u && nc -l 7788 &
[1] 26576
1000
#### works: nc ran as my user
$ lsof -PniTCP:7788 -sTCP:LISTEN
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nc 26580 mak 3u IPv4 2818104 0t0 TCP *:7788 (LISTEN)
#### fails: ssh is running as root
$ lsof -PniTCP:22 -sTCP:LISTEN
#### works if we are root
$ sudo lsof -PniTCP:22 -sTCP:LISTEN
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
sshd 2524 root 3u IPv4 18426 0t0 TCP *:22 (LISTEN)
sshd 2524 root 4u IPv6 18428 0t0 TCP *:22 (LISTEN)
{noformat}
Solr runs as non-root.
So if some other process owned by a different user occupies that port, you will get a false negative (it will say Solr is not running even though it is)
I can't think of a good way to fix or work around that (short of not using {{lsof}} in the first place).
Perhaps an uncommon scenario we need not worry too much about.
h2. 2. lsof can complain about lack of /etc/password entries
If {{lsof}} runs without the current effective user having an entry in {{/etc/passwd}},
it produces a warning on stderr:
{noformat}
$ docker run -d -u 0 solr:7.6.0 bash -c "chown -R 8888 /opt/; gosu 8888 solr-foreground"
4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
$ docker exec -it -u 8888 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
lsof: no pwd entry for UID 8888
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
lsof: no pwd entry for UID 8888
java 9 8888 115u IPv4 2813503 0t0 TCP *:8983 (LISTEN)
I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN>/dev/null
lsof: no pwd entry for UID 8888
lsof: no pwd entry for UID 8888
{noformat}
You can avoid this by using the {{-t}} tag, which specifies that lsof should produce terse output with process identifiers only and no header:
{noformat}
I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
9
{noformat}
This is a rare circumstance, but one I encountered and worked around.
h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible arguments
On Alpine, {{busybox}} implements {{lsof}}, but does not support the arguments, so you get:
{noformat}
$ docker run -it alpine sh
/ # lsof -t -PniTCP:8983 -sTCP:LISTEN
1 /bin/busybox /dev/pts/0
1 /bin/busybox /dev/pts/0
1 /bin/busybox /dev/pts/0
1 /bin/busybox /dev/tty
{noformat}
so if you ran Solr, in the background, and it failed to start, this code would produce a false positive.
For example:
{noformat}
docker volume create mysol
docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
docker run -it -v mysol:/mysol -w /mysol -v $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
apk add procps bash
tar xvzf /solr-7.6.0.tgz
chown -R 8983:8983 .
{noformat}
then in a separate terminal:
{noformat}
$ docker exec -it -u 8983 serene_saha sh
/mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
whoami: unknown uid 8983
Waiting up to 180 seconds to see Solr running on port 8983 [|]
Started Solr server on port 8983 (pid=101). Happy searching!
/mysol $
{noformat}
and in another separate terminal:
{noformat}
$ docker exec -it thirsty_liskov bash
bash-4.4$ cat server/logs/solr-8983-console.log
Unrecognized option: --invalid
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{noformat}
so it is saying Solr is running, when it isn't.
Now, all this can be avoided by just installing the real {{lsof}} with {{apk add lsof}} which works properly. So should we detect and warn? Or even refuse to run rather than invoke a tool that does not implement the contract we expect?
h2. 4. Shellcheck dislikes backticks
Shellcheck says {{SC2006: Use $(..) instead of legacy `..`.}}
Now, shellcheck complains about 130 other issues too, so it's a drop in a bucket, but if we're changing things, might as well fix that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org