You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martijn Koster (JIRA)" <ji...@apache.org> on 2018/12/21 12:53:00 UTC

[jira] [Created] (SOLR-13089) bin/solr's use of lsof has some issues

Martijn Koster created SOLR-13089:
-------------------------------------

             Summary: bin/solr's use of lsof has some issues
                 Key: SOLR-13089
                 URL: https://issues.apache.org/jira/browse/SOLR-13089
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCLI
            Reporter: Martijn Koster


The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr port is being listened on:
{noformat}
running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
if [ -z "$running" ]; then
{noformat}
code is at [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].

There are a few issues with this.
h2. 1. False negatives when port is occupied by different user

When {{lsof}} runs as non-root, it only shows sockets for processes with your effective uid.
 For example:
{noformat}
$ id -u && nc -l 7788 &
[1] 26576
1000

#### works: nc ran as my user
$ lsof -PniTCP:7788 -sTCP:LISTEN
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
nc      26580  mak    3u  IPv4 2818104      0t0  TCP *:7788 (LISTEN)

#### fails: ssh is running as root
$ lsof -PniTCP:22 -sTCP:LISTEN

#### works if we are root
$ sudo lsof -PniTCP:22 -sTCP:LISTEN
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
sshd    2524 root    3u  IPv4  18426      0t0  TCP *:22 (LISTEN)
sshd    2524 root    4u  IPv6  18428      0t0  TCP *:22 (LISTEN)
{noformat}
Solr runs as non-root.
 So if some other process owned by a different user occupies that port, you will get a false negative (it will say Solr is not running even though it is)
 I can't think of a good way to fix or work around that (short of not using {{lsof}} in the first place).
 Perhaps an uncommon scenario we need not worry too much about.
h2. 2. lsof can complain about lack of /etc/password entries

If {{lsof}} runs without the current effective user having an entry in {{/etc/passwd}},
 it produces a warning on stderr:
{noformat}
$ docker run -d -u 0 solr:7.6.0  bash -c "chown -R 8888 /opt/; gosu 8888 solr-foreground"
4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6

$ docker exec -it -u 8888 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
lsof: no pwd entry for UID 8888
COMMAND PID     USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
lsof: no pwd entry for UID 8888
java      9     8888  115u  IPv4 2813503      0t0  TCP *:8983 (LISTEN)
I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN>/dev/null
lsof: no pwd entry for UID 8888
lsof: no pwd entry for UID 8888
{noformat}
You can avoid this by using the {{-t}} tag, which specifies that lsof should produce terse output with process identifiers only and no header:
{noformat}
I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
9
{noformat}
This is a rare circumstance, but one I encountered and worked around.
h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible arguments

On Alpine, {{busybox}} implements {{lsof}}, but does not support the arguments, so you get:
{noformat}
$ docker run -it alpine sh
/ # lsof -t -PniTCP:8983 -sTCP:LISTEN
1	/bin/busybox	/dev/pts/0
1	/bin/busybox	/dev/pts/0
1	/bin/busybox	/dev/pts/0
1	/bin/busybox	/dev/tty
{noformat}
so if you ran Solr, in the background, and it failed to start, this code would produce a false positive.
 For example:
{noformat}
docker volume create mysol
docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
docker run -it -v mysol:/mysol -w /mysol -v $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
apk add procps bash
tar xvzf /solr-7.6.0.tgz
chown -R 8983:8983 .
{noformat}
then in a separate terminal:
{noformat}
$ docker exec -it -u 8983 serene_saha  sh
/mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
whoami: unknown uid 8983
Waiting up to 180 seconds to see Solr running on port 8983 [|]  
Started Solr server on port 8983 (pid=101). Happy searching!

/mysol $ 
{noformat}
and in another separate terminal:
{noformat}
$ docker exec -it thirsty_liskov bash

bash-4.4$ cat server/logs/solr-8983-console.log 
Unrecognized option: --invalid
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{noformat}
so it is saying Solr is running, when it isn't.

Now, all this can be avoided by just installing the real {{lsof}} with {{apk add lsof}} which works properly. So should we detect and warn? Or even refuse to run rather than invoke a tool that does not implement the contract we expect?

h2. 4. Shellcheck dislikes backticks

Shellcheck says {{SC2006: Use $(..) instead of legacy `..`.}}
 Now, shellcheck complains about 130 other issues too, so it's a drop in a bucket, but if we're changing things, might as well fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org