You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martijn Koster (JIRA)" <ji...@apache.org> on 2018/12/21 12:55:00 UTC

[jira] [Updated] (SOLR-13089) bin/solr's use of lsof has some issues

     [ https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn Koster updated SOLR-13089:
----------------------------------
    Attachment: 0001-SOLR-13089-lsof-fixes.patch

> bin/solr's use of lsof has some issues
> --------------------------------------
>
>                 Key: SOLR-13089
>                 URL: https://issues.apache.org/jira/browse/SOLR-13089
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCLI
>            Reporter: Martijn Koster
>            Priority: Minor
>         Attachments: 0001-SOLR-13089-lsof-fixes.patch
>
>
> The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr port is being listened on:
> {noformat}
> running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
> if [ -z "$running" ]; then
> {noformat}
> code is at [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147].
> There are a few issues with this.
> h2. 1. False negatives when port is occupied by different user
> When {{lsof}} runs as non-root, it only shows sockets for processes with your effective uid.
>  For example:
> {noformat}
> $ id -u && nc -l 7788 &
> [1] 26576
> 1000
> #### works: nc ran as my user
> $ lsof -PniTCP:7788 -sTCP:LISTEN
> COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc      26580  mak    3u  IPv4 2818104      0t0  TCP *:7788 (LISTEN)
> #### fails: ssh is running as root
> $ lsof -PniTCP:22 -sTCP:LISTEN
> #### works if we are root
> $ sudo lsof -PniTCP:22 -sTCP:LISTEN
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> sshd    2524 root    3u  IPv4  18426      0t0  TCP *:22 (LISTEN)
> sshd    2524 root    4u  IPv6  18428      0t0  TCP *:22 (LISTEN)
> {noformat}
> Solr runs as non-root.
>  So if some other process owned by a different user occupies that port, you will get a false negative (it will say Solr is not running even though it is)
>  I can't think of a good way to fix or work around that (short of not using {{lsof}} in the first place).
>  Perhaps an uncommon scenario we need not worry too much about.
> h2. 2. lsof can complain about lack of /etc/password entries
> If {{lsof}} runs without the current effective user having an entry in {{/etc/passwd}},
>  it produces a warning on stderr:
> {noformat}
> $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R 8888 /opt/; gosu 8888 solr-foreground"
> 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
> $ docker exec -it -u 8888 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
> lsof: no pwd entry for UID 8888
> COMMAND PID     USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> lsof: no pwd entry for UID 8888
> java      9     8888  115u  IPv4 2813503      0t0  TCP *:8983 (LISTEN)
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN>/dev/null
> lsof: no pwd entry for UID 8888
> lsof: no pwd entry for UID 8888
> {noformat}
> You can avoid this by using the {{-t}} tag, which specifies that lsof should produce terse output with process identifiers only and no header:
> {noformat}
> I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
> 9
> {noformat}
> This is a rare circumstance, but one I encountered and worked around.
> h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible arguments
> On Alpine, {{busybox}} implements {{lsof}}, but does not support the arguments, so you get:
> {noformat}
> $ docker run -it alpine sh
> / # lsof -t -PniTCP:8983 -sTCP:LISTEN
> 1	/bin/busybox	/dev/pts/0
> 1	/bin/busybox	/dev/pts/0
> 1	/bin/busybox	/dev/pts/0
> 1	/bin/busybox	/dev/tty
> {noformat}
> so if you ran Solr, in the background, and it failed to start, this code would produce a false positive.
>  For example:
> {noformat}
> docker volume create mysol
> docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
> docker run -it -v mysol:/mysol -w /mysol -v $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
> apk add procps bash
> tar xvzf /solr-7.6.0.tgz
> chown -R 8983:8983 .
> {noformat}
> then in a separate terminal:
> {noformat}
> $ docker exec -it -u 8983 serene_saha  sh
> /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
> whoami: unknown uid 8983
> Waiting up to 180 seconds to see Solr running on port 8983 [|]  
> Started Solr server on port 8983 (pid=101). Happy searching!
> /mysol $ 
> {noformat}
> and in another separate terminal:
> {noformat}
> $ docker exec -it thirsty_liskov bash
> bash-4.4$ cat server/logs/solr-8983-console.log 
> Unrecognized option: --invalid
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {noformat}
> so it is saying Solr is running, when it isn't.
> Now, all this can be avoided by just installing the real {{lsof}} with {{apk add lsof}} which works properly. So should we detect and warn? Or even refuse to run rather than invoke a tool that does not implement the contract we expect?
> h2. 4. Shellcheck dislikes backticks
> Shellcheck says {{SC2006: Use $(..) instead of legacy `..`.}}
>  Now, shellcheck complains about 130 other issues too, so it's a drop in a bucket, but if we're changing things, might as well fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org