You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Andrea Biardi (JIRA)" <ji...@apache.org> on 2017/03/06 15:49:32 UTC

[jira] [Created] (SLING-6615) sling listener crashes - cannot reliably start sling

Andrea Biardi created SLING-6615:
------------------------------------

             Summary: sling listener crashes - cannot reliably start sling
                 Key: SLING-6615
                 URL: https://issues.apache.org/jira/browse/SLING-6615
             Project: Sling
          Issue Type: Bug
         Environment: linux (centos 5.x / fedora 22)
java 8 update 111
            Reporter: Andrea Biardi
         Attachments: log, sling_test.sh


In my production environment I use a script to upload contents to an instance of sling. Occasionally, I get "Connection refused" errors (I use curl for the uploads).

Thinking of a problem with my own customizations, I wrote a shell script from scratch, using the discover-sling-in-15-minutes webpage as a reference. I'm attaching the script for reference (careful: it will kill any existing sling.launchpad process).

I may be doing something wrong here, but this is what I notice:
- During startup, sling produces a lot of spurious http errors (401,403,503); I believe this is expected, as sling is still booting up.
- When I'm finally able to upload some content, and I try to retrieve it back, I seem to hit some kind of race condition that shuts down the listener process.

The crash doesn't happen every time, but still fairly often (say 1 out of 5 or 10 runs). When it does, I can see that netstat reports no processes listening on port 8080, although the java PID is still alive. Seems like the listener has crashed, and I have to kill the PID and restart sling.

Attached is a log of the output from a problematic run (./sling_test 2>&1 | tee log); as you can see towards the end of the file, the loop is stuck in a "Connection refused" error that will never recover.

At the prompt, I can verify that the sling.launchpad process is still running, however it's no longer listening on port 8080:

[root@xxxxxxxx sl8]# ps axf | grep launchpad
 3486 pts/0    S+     0:00          \_ grep --color=auto launchpad
 3076 pts/0    Sl     0:21 java -jar org.apache.sling.launchpad-8.jar
[root@xxxxxxxx sl8]# cat /proc/3076/cmdline
java-jarorg.apache.sling.launchpad-8.jar
[root@xxxxxxxx sl8]# netstat -lnp | grep 8080
[root@xxxxxxxx sl8]#

Am I doing something wrong or is this a known issue and if so - is there a workaround?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)