You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martijn Koster (JIRA)" <ji...@apache.org> on 2018/12/21 12:55:00 UTC
[jira] [Updated] (SOLR-13089) bin/solr's use of lsof has some
issues
[ https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martijn Koster updated SOLR-13089:
----------------------------------
Attachment: 0001-SOLR-13089-lsof-fixes.patch
> bin/solr's use of lsof has some issues
> --------------------------------------
>
> Key: SOLR-13089
> URL: https://issues.apache.org/jira/browse/SOLR-13089
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCLI
> Reporter: Martijn Koster
> Priority: Minor
> Attachments: 0001-SOLR-13089-lsof-fixes.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your effective uid.
> For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
> #### works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> nc 26580 mak 3u IPv4 2818104 0t0 TCP *:7788 (LISTEN)
> #### fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
> #### works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> sshd 2524 root 3u IPv4 18426 0t0 TCP *:22 (LISTEN)
> sshd 2524 root 4u IPv6 18428 0t0 TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
> So if some other process owned by a different user occupies that port, you will get a false negative (it will say Solr is not running even though it is)
> I can't think of a good way to fix or work around that (short of not using {{lsof}} in the first place).
> Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in {{/etc/passwd}},
> it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0 bash -c "chown -R 8888 /opt/; gosu 8888 solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u 8888 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 8888
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 8888
> java 9 8888 115u IPv4 2813503 0t0 TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 8888
> lsof: no pwd entry for UID 8888
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1 /bin/busybox /dev/pts/0
> 1 /bin/busybox /dev/pts/0
> 1 /bin/busybox /dev/pts/0
> 1 /bin/busybox /dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code would produce a false positive.
> For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_liskov bash
> bash-4.4$ cat server/logs/solr-8983-console.log
> Unrecognized option: --invalid
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {noformat}
> so it is saying Solr is running, when it isn't.
> Now, all this can be avoided by just installing the real {{lsof}} with {{apk add lsof}} which works properly. So should we detect and warn? Or even refuse to run rather than invoke a tool that does not implement the contract we expect?
> h2. 4. Shellcheck dislikes backticks
> Shellcheck says {{SC2006: Use $(..) instead of legacy `..`.}}
> Now, shellcheck complains about 130 other issues too, so it's a drop in a bucket, but if we're changing things, might as well fix that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org