You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Robert Munteanu <ro...@apache.org> on 2020/01/30 09:40:49 UTC

Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Hi Carlos,

Yes, this may be a timing issue.

I could not follow the link you sent me for some reason. I think the
build log is the one from [1]. If that is the case, the error is
visible at shutdown, and probably does not have the same root cause.

I'd still like to get some steps to reproduce - even if it's a change
of 1 in 3, even if it depends on containers.

Thanks,
Robert


[1]: https://builds.apache.org/job/Sling/job/sling-org-apache-sling-starter/job/master/104/console

On Wed, 2020-01-29 at 21:27 -0500, Carlos Munoz wrote:
> Robert, I checked the latest (master) pipeline build logs for the
> starter
> project:
> 
> https://builds.apache.org/blue/organizations/jenkins/Sling%2Fsling-org-apache-sling-starter/detail/master/104/pipeline/24
> 
> and found that there is a very similar error being reported
> (different
> principal and bundle), but same type of exception nonetheless.
> 
> Carlos
> 
> 
> On Wed, Jan 29, 2020 at 7:47 PM Carlos Munoz <ca...@redhat.com>
> wrote:
> 
> > Robert, I wonder if this is a timing issue. I’m not sure I
> > understand how
> > Sling is loading bundles and configurations, but is it possible
> > that it
> > could load up a bundle which needs a specific configuration before
> > said
> > configuration has finished loading?
> > 
> > I mention this because we are seeing the error now on a
> > containerized
> > environment where resources may be more virtualized than in a local
> > environment, where the application seems to run without any issues.
> > 
> > Regards,
> > 
> > Carlos
> > 
> > On Tue, Jan 28, 2020 at 10:11 PM Carlos Munoz <ca...@redhat.com>
> > wrote:
> > 
> > > Hi Robert, I'm picking up this thread again since we briefly
> > > talked about
> > > this problem; allow me to recap:
> > > We are attempting to migrate bundle versions for a Sling
> > > application from
> > > their Sling 11 versions to the latest stable versions. The
> > > application is
> > > running against an already populated mongo database and we are
> > > seeing the
> > > following exception when deploying.
> > > 
> > > 29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup
> > > Thread
> > > #4] ERROR: Bundle '160' EventDispatcher: Error during dispatch.
> > > (org.apache.sling.api.SlingException: Can't create the JCR event
> > > listener.)
> > > org.apache.sling.api.SlingException: Can't create the JCR event
> > > listener.
> > > at
> > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > ider.registerListeners(JcrResourceProvider.java:227)
> > > 
> > > at
> > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > ider.start(JcrResourceProvider.java:182)
> > > 
> > > at
> > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > Handler.activate(ResourceProviderHandler.java:74)
> > > 
> > > at
> > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > Tracker.activate(ResourceProviderTracker.java:360)
> > > 
> > > at
> > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > Tracker.register(ResourceProviderTracker.java:192)
> > > 
> > > at
> > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > Tracker.access$200(ResourceProviderTracker.java:59)
> > > 
> > > at
> > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > Tracker$1.addingService(ResourceProviderTracker.java:130)
> > > 
> > > at
> > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > Tracker$1.addingService(ResourceProviderTracker.java:106)
> > > 
> > > at
> > > org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(Ser
> > > viceTracker.java:943)
> > > 
> > > at
> > > org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(Ser
> > > viceTracker.java:871)
> > > 
> > > at
> > > org.osgi.util.tracker.AbstractTracked.trackAdding(AbstractTracked
> > > .java:256)
> > > at
> > > org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:
> > > 229)
> > > at
> > > org.osgi.util.tracker.ServiceTracker$Tracked.serviceChanged(Servi
> > > ceTracker.java:903)
> > > 
> > > at
> > > org.apache.felix.framework.EventDispatcher.invokeServiceListenerC
> > > allback(EventDispatcher.java:990)
> > > 
> > > at
> > > org.apache.felix.framework.EventDispatcher.fireEventImmediately(E
> > > ventDispatcher.java:838)
> > > 
> > > at
> > > org.apache.felix.framework.EventDispatcher.fireServiceEvent(Event
> > > Dispatcher.java:545)
> > > 
> > > at
> > > org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833
> > > )
> > > at
> > > org.apache.felix.framework.Felix.registerService(Felix.java:3804)
> > > at
> > > org.apache.felix.framework.BundleContextImpl.registerService(Bund
> > > leContextImpl.java:328)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.AbstractComponentManager$3.regi
> > > ster(AbstractComponentManager.java:906)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.AbstractComponentManager$3.regi
> > > ster(AbstractComponentManager.java:892)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.RegistrationManager.changeRegis
> > > tration(RegistrationManager.java:128)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.AbstractComponentManager.regist
> > > erService(AbstractComponentManager.java:959)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.AbstractComponentManager.activa
> > > teInternal(AbstractComponentManager.java:732)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.DependencyManager$SingleStaticC
> > > ustomizer.addedService(DependencyManager.java:1045)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.DependencyManager$SingleStaticC
> > > ustomizer.addedService(DependencyManager.java:999)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customiz
> > > erAdded(ServiceTracker.java:1216)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customiz
> > > erAdded(ServiceTracker.java:1137)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.
> > > trackAdding(ServiceTracker.java:944)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.
> > > track(ServiceTracker.java:880)
> > > 
> > > at
> > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceC
> > > hanged(ServiceTracker.java:1168)
> > > 
> > > at
> > > org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.s
> > > erviceChanged(BundleComponentActivator.java:125)
> > > 
> > > at
> > > org.apache.felix.framework.EventDispatcher.invokeServiceListenerC
> > > allback(EventDispatcher.java:990)
> > > 
> > > at
> > > org.apache.felix.framework.EventDispatcher.fireEventImmediately(E
> > > ventDispatcher.java:838)
> > > 
> > > at
> > > org.apache.felix.framework.EventDispatcher.fireServiceEvent(Event
> > > Dispatcher.java:545)
> > > 
> > > at
> > > org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833
> > > )
> > > at
> > > org.apache.felix.framework.Felix.registerService(Felix.java:3804)
> > > at
> > > org.apache.felix.framework.BundleContextImpl.registerService(Bund
> > > leContextImpl.java:328)
> > > 
> > > at
> > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.register
> > > Service(AbstractSlingRepositoryManager.java:218)
> > > 
> > > at
> > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.initiali
> > > zeAndRegisterRepositoryService(AbstractSlingRepositoryManager.jav
> > > a:541)
> > > 
> > > at
> > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.access$3
> > > 00(AbstractSlingRepositoryManager.java:92)
> > > 
> > > at
> > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager$4.run(Ab
> > > stractSlingRepositoryManager.java:496)
> > > 
> > > Caused by: javax.jcr.LoginException: Can neither derive user name
> > > nor
> > > principal names for bundle org.apache.sling.jcr.resource [154]
> > > and sub
> > > service observation
> > > at
> > > org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(A
> > > bstractSlingRepository2.java:387)
> > > 
> > > at
> > > org.apache.sling.jcr.resource.internal.JcrListenerBaseConfig.<ini
> > > t>(JcrListenerBaseConfig.java:62)
> > > 
> > > at
> > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > ider.registerListeners(JcrResourceProvider.java:218)
> > > 
> > > ... 41 more
> > > 
> > > 
> > > The application deploys fine when not running against mongo, or
> > > when
> > > running against a clean mongo instance.
> > > 
> > > The changes are located here for reference:
> > > 
> > > https://github.com/redhataccess/pantheon/pull/219/files#diff-e93a9e4b7b62ab20d546f78f9ac775c8L33
> > > 
> > > Any ideas on what could be going wrong?
> > > 
> > > Regards,
> > > 
> > > Carlos
> > > 
> > > 
> > > 
> > > On Mon, Jan 27, 2020 at 4:57 AM Robert Munteanu <
> > > rombert@apache.org>
> > > wrote:
> > > 
> > > > Happy to hear that you got it sorted out! Feel free to come
> > > > back with
> > > > more questions if you have any.
> > > > 
> > > > Thanks,
> > > > Robert
> > > > 
> > > > On Fri, 2020-01-24 at 10:58 -0500, Carlos Munoz wrote:
> > > > > Thanks Robert. I think we actually found out what was going
> > > > > on: it
> > > > > seems we
> > > > > have a poorly defined index which was being deployed as part
> > > > > of our
> > > > > bundle
> > > > > and which was interfering with some of the other indexes. As
> > > > > soon as
> > > > > we
> > > > > removed it everything started working once again. We are
> > > > > working on a
> > > > > better index for the query right now.
> > > > > 
> > > > > Really appreciate your willingness to help here... ++
> > > > > 
> > > > > On Fri, Jan 24, 2020 at 5:03 AM Robert Munteanu <
> > > > > rombert@apache.org>
> > > > > wrote:
> > > > > 
> > > > > > I tried building the app from source code but did not
> > > > > > reproduce the
> > > > > > problem. I guess this matches your experience - this
> > > > > > happens only
> > > > > > during an 'upgrade'.
> > > > > > 
> > > > > > Can you please give me a set of steps to reproduce? Ideally
> > > > > > without
> > > > > > MongoDB, but if that's required leave it in :-)
> > > > > > 
> > > > > > Thanks,
> > > > > > Robert
> > > > > > 
> > > > > > On Wed, 2020-01-22 at 22:08 -0500, Carlos Munoz wrote:
> > > > > > > I double checked and we do have the mapping. We copied
> > > > > > > all the
> > > > > > > provisioning
> > > > > > > files from the commit you recommended earlier [1] and
> > > > > > > deployed
> > > > > > > like
> > > > > > > that.
> > > > > > > 
> > > > > > > In fact, you can see our provisioning files here: [2] We
> > > > > > > are only
> > > > > > > adding a
> > > > > > > single file with our own bundle and configurations.
> > > > > > > 
> > > > > > > [1]
> > > > > > > 
> > > > https://github.com/apache/sling-org-apache-sling-starter/commit/c4f6e3b
> > > > > > > [2]
> > > > > > > 
> > > > https://github.com/redhataccess/pantheon/tree/upgrade-sling-bundles/pantheon-slingstart/src/main/provisioning
> > > > > > > On Wed, Jan 22, 2020 at 4:54 PM Robert Munteanu <
> > > > > > > rombert@apache.org>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Wed, 2020-01-22 at 16:16 -0500, Carlos Munoz wrote:
> > > > > > > > > Thanks for the tip Daniel!
> > > > > > > > > 
> > > > > > > > > Robert - we were able to successfully package the
> > > > > > > > > sling
> > > > > > > > > starter
> > > > > > > > > with
> > > > > > > > > the
> > > > > > > > > latest definitions as you pointed, but when deploying
> > > > > > > > > on top
> > > > > > > > > of
> > > > > > > > > an
> > > > > > > > > existing
> > > > > > > > > database we started getting a JCR error:
> > > > > > > > > 
> > > > > > > > > javax.jcr.LoginException: Can neither derive user
> > > > > > > > > name nor
> > > > > > > > > principal
> > > > > > > > > names
> > > > > > > > > for bundle org.apache.sling.jcr.resource [152] and
> > > > > > > > > sub
> > > > > > > > > service
> > > > > > > > > observation
> > > > > > > > > 
> > > > > > > > > We don't get the same error when deploying on a fresh
> > > > > > > > > database.
> > > > > > > > 
> > > > > > > > It seems that you have some missing service user
> > > > > > > > mappings.
> > > > > > > > Those
> > > > > > > > might
> > > > > > > > be required by newer versions of the bundles that you
> > > > > > > > just
> > > > > > > > consumed. In
> > > > > > > > the Sling Starter the current mapping is defined at
> > > > > > > > [1].
> > > > > > > > 
> > > > > > > > Does adding that as a configuration to your application
> > > > > > > > help?
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Robert
> > > > > > > > 
> > > > > > > > 
> > > > > > > > [1]:
> > > > > > > > 
> > > > https://github.com/apache/sling-org-apache-sling-starter/blob/7eac121fc3f00c95ef5b8ac38133f6796a4a6c08/src/main/provisioning/sling.txt#L199-L202
> > > > 
> > > > --
> > 
> > Carlos A. Muñoz
> > 
> > Manager, Software Engineering - Customer Platform
> > 
> > Red Hat <https://www.redhat.com>
> > <https://red.ht/sig>
> > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
Yes, I confirm that these steps reproduce the problem for me. Can you
please file an issue under https://issues.apache.org/jira/browse/SLING
so we can better track this?

On Mon, 2020-02-17 at 11:24 -0500, Carlos Munoz wrote:
> Thanks for the information Robert.
> 
> To replicate the issue all I needed was a mongodb (I used a full
> replica
> set, see my instructions in a previous email about how to get one
> going
> using podman) and a single process running sling.
> 
> The problem does happen when I do the following:
> 
> 2. Start Sling instance A, wait for it to start
> 3. Stop Sling instance A, wait for it to stop
> 4. Start Sling instance B - Error
> 
> but let me add more
> 
> 5. Start Sling Instance A again - Success (note I didn't remove the
> sling
> dir)
> 6. Start Sling instance B again - Success (note I didn't remove the
> sling
> dir)
> 
> this means that even if Sling recreates the sling directory and fails
> the
> startup, next time it will succeed. Unfortunately we don't have that
> luxury
> in containers because the sling directory is not persisted.
> 
> I think this is a bug, but I'll keep playing with it a bit to see if
> I can
> find out more.
> 
> Carlos
> 
> 
> 
> 
> 
> 
> On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > Robert I managed to replicate the issue in a local, non-
> > > containerized
> > > environment (!!!).
> > > 
> > > The problem seems to be when the database is kept but the 'sling'
> > > directory
> > > is cleared out across restarts (as it is for us when the
> > > container
> > > goes
> > > away). As I said before this doesn't seem to be a problem with
> > > the
> > > Sling 11
> > > bundles.
> > > 
> > > The first basic solution will be to persist the 'sling' directory
> > > across
> > > restarts, and I was wondering if this is a bug, or as designed.
> > 
> > I think this should work.
> > 
> > > I also wonder if once persisted, multiple containers could share
> > > this
> > > directory.
> > 
> > This directory can't be shared, as it holds runtime data related to
> > Sling. For instance, a bundle that is started in instance A could
> > be
> > starting on instance B.
> > 
> > There is at least one file ( sling.id ) that holds data that must
> > not
> > be the same between instances.
> > 
> > So I would advise as marking the directory as container-private as
> > a
> > first step.
> > 
> > Robert
> > 
> > > Regards,
> > > 
> > > Carlos
> > > 
> > > 
> > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > > 
> > > > Thanks Robert (and once again I can't stress enough how
> > > > grateful I
> > > > am for
> > > > all your help).
> > > > 
> > > > Right now we deploy our container with the expectation that the
> > > > mongo db
> > > > is the only necessary state we need to keep; everything else is
> > > > throwaway.
> > > > This means that a totally new container connected to the
> > > > mongodb
> > > > should
> > > > pick up the state and run the same as the first time it was
> > > > fired
> > > > up. Do
> > > > you think this is an incorrect assumption? If so, what are
> > > > other
> > > > pieces of
> > > > state we should be keeping for subsequent restarts?
> > > > 
> > > > This assumption has worked well for us with the current sling
> > > > 11
> > > > release,
> > > > but it seems to break with the more up-to-date bundles. Perhaps
> > > > running
> > > > Sling in a container is just not meant to be.
> > > > 
> > > > Regards,
> > > > 
> > > > Carlos
> > > > 
> > > > 
> > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > rombert@apache.org
> > > > wrote:
> > > > 
> > > > > Hi Carlos,
> > > > > 
> > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level logs
> > > > > > for
> > > > > > every
> > > > > > bundle? I tried passing a few configuration arguments from
> > > > > > the
> > > > > > command line
> > > > > > but nothing seemed to work.
> > > > > 
> > > > > Try configuring the LogManager to debug at
> > > > > 
> > > > > 
> > > > > 
> > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > Thanks,
> > > > > Robert
> > > > > 
> > > > > > Carlos
> > > > > > 
> > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > > bdelacretaz@apache.org>
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > > camunoz@redhat.com>
> > > > > > > wrote:
> > > > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > > > restarting?
> > > > > > > > And what
> > > > > > > > class could we start looking into to debug if this is
> > > > > > > > the
> > > > > > > > case?...
> > > > > > > 
> > > > > > > It's not uncommon to see extra restarts of OSGi
> > > > > > > components at
> > > > > > > startup,
> > > > > > > for various reasons.
> > > > > > > 
> > > > > > > The simplest way to detect and log multiple repository
> > > > > > > startups
> > > > > > > might
> > > > > > > be to implement a SlingRepositoryInitializer service [1]
> > > > > > > that's
> > > > > > > called
> > > > > > > at every startup, or use the logs of an existing one like
> > > > > > > the
> > > > > > > JCR
> > > > > > > RepositoryInitializer [2] if that has anything to process
> > > > > > > in
> > > > > > > your
> > > > > > > system.
> > > > > > > 
> > > > > > > -Bertrand
> > > > > > > 
> > > > > > > [1]
> > > > > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > [2]
> > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
Hi Carlos,

I think I found a solution, can you please check my latest comment on
SLING-9118 [1] ?

Thanks!
Robert

[1]: https://issues.apache.org/jira/browse/SLING-9118?focusedCommentId=17073183&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17073183

On Thu, 2020-03-26 at 17:46 -0400, Carlos Munoz wrote:
> Hi Robert,
> 
> I've found that it's not as simple. There is still some factor of
> randomness attached to this issue. After doing the bisect more times,
> I've
> found that commit 0a13d3467aa78b46ec33ae5687418685f90a9e12 seems to
> work
> *most* of the time. There are still times where I get the error, but
> it is
> recoverable on the next run.
> 
> Carlos
> 
> On Thu, Mar 19, 2020 at 6:21 AM Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > That's good info, thank you! I've added some details to the Jira
> > issue.
> > I tried reverting the commits I suspect are at fault
> > 
> > - 
> > https://github.com/apache/sling-org-apache-sling-jcr-base/commit/6f5771a
> > - 
> > https://github.com/apache/sling-org-apache-sling-jcr-base/commit/3de2b9f
> > 
> > But that failed due to conflicts. I will try and manually remove
> > the
> > changes and see what that does.
> > Robert
> > 
> > On Wed, 2020-03-18 at 21:24 -0400, Carlos Munoz wrote:
> > > I went through the bisect process and I got the first bad commit:
> > > 
> > > commit bb1e10d97f3c163fb87917ea782afff674050891
> > > Author: Eric Norman <en...@apache.org>
> > > Date:   Sun Dec 16 12:33:08 2018 -0800
> > > 
> > >     switch to released JCR Base 3.0.6
> > > 
> > > (I tried it a couple of times just to be sure)
> > > 
> > > I tried running our app with the commit before that and I get it
> > > to
> > > run.
> > > (There are other unrelated problems).
> > > 
> > > 
> > > On Mon, Mar 16, 2020 at 6:12 PM Robert Munteanu <
> > > rombert@apache.org>
> > > wrote:
> > > 
> > > > Hi Carlos,
> > > > 
> > > > Apologies for the delay ...
> > > > 
> > > > What I was thinking of doing myself, but did not have the time
> > > > is
> > > > the
> > > > following
> > > > 
> > > > 1. Find a version of Sling for which the scenario in SLING-9118
> > > > works.
> > > > Perhaps Sling Starter 11 is a good start.
> > > > 2. Run a `git bisect` check between sling starter 11 and the
> > > > current
> > > > master branch
> > > > 
> > > > Assuming my guess is correct, git would say
> > > > 
> > > > Bisecting: 36 revisions left to test after this (roughly 5
> > > > steps)
> > > > [c1aedf7b292f7835ceb4e2f56fedcb3294c60756] Update to Tika 1.21
> > > > 
> > > > So not that many steps to test.
> > > > 
> > > > If you would manage to isolate the change to the starter that
> > > > broke
> > > > this, it would make it much easier to understand where the
> > > > problem
> > > > is
> > > > coming from.
> > > > 
> > > > Thanks!
> > > > Robert
> > > > 
> > > > On Mon, 2020-03-16 at 16:27 -0400, Carlos Munoz wrote:
> > > > > Hi Robert,
> > > > > 
> > > > > Just a friendly ping about this issue :)
> > > > > 
> > > > > We could try to submit a fix with some potential guidance
> > > > > from
> > > > > you.
> > > > > For
> > > > > example, which of the many Sling bundles should we start
> > > > > looking
> > > > > at?
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Carlos
> > > > > 
> > > > > 
> > > > > On Wed, Feb 26, 2020 at 7:24 AM Carlos Munoz <
> > > > > camunoz@redhat.com>
> > > > > wrote:
> > > > > 
> > > > > > Thanks Robert. As always your help is appreciated.
> > > > > > 
> > > > > > On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <
> > > > > > rombert@apache.org
> > > > > > wrote:
> > > > > > 
> > > > > > > Thanks, Ben,
> > > > > > > 
> > > > > > > I added a bit more detail, based on our mailing list
> > > > > > > conversations.
> > > > > > > I'll have limited access in the next two weeks, but if no
> > > > > > > one
> > > > > > > picks it
> > > > > > > up I'll look into it when I get back.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Robert
> > > > > > > 
> > > > > > > On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> > > > > > > > I went ahead and created
> > > > > > > > https://issues.apache.org/jira/browse/SLING-9118
> > > > > > > > for this. Although the ultimate goal here is
> > > > > > > > containerization,
> > > > > > > > I
> > > > > > > > neglected
> > > > > > > > to include any details to that effect in the ticket,
> > > > > > > > since
> > > > > > > > the
> > > > > > > > behavior is
> > > > > > > > reproducible without that being a complicating factor.
> > > > > > > > 
> > > > > > > > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <
> > > > > > > > rombert@apache.org>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > > > > > > > > I am following along conceptually - I want to make
> > > > > > > > > > sure
> > > > > > > > > > I
> > > > > > > > > > understand
> > > > > > > > > > what's
> > > > > > > > > > being described.
> > > > > > > > > > 
> > > > > > > > > > Let's say Sling Instance A starts successfully the
> > > > > > > > > > first
> > > > > > > > > > time. If
> > > > > > > > > > we
> > > > > > > > > > restart Sling Instance A, we expect subsequent
> > > > > > > > > > restarts
> > > > > > > > > > to
> > > > > > > > > > also
> > > > > > > > > > succeed,
> > > > > > > > > > without removing the sling directory.
> > > > > > > > > > Now let's say Sling Instance B does NOT start
> > > > > > > > > > successfully
> > > > > > > > > > the
> > > > > > > > > > first
> > > > > > > > > > time.
> > > > > > > > > > Despite that, we expect subsequent restarts to
> > > > > > > > > > succeed
> > > > > > > > > > without
> > > > > > > > > > removing the
> > > > > > > > > > sling directory.
> > > > > > > > > > 
> > > > > > > > > > Correct so far?
> > > > > > > > > 
> > > > > > > > > Yes, correct.
> > > > > > > > > 
> > > > > > > > > > Assuming yes... what if this is running in k8s, and
> > > > > > > > > > k8s
> > > > > > > > > > sees that
> > > > > > > > > > Sling
> > > > > > > > > > Instance B did not start successfully, and kills
> > > > > > > > > > the
> > > > > > > > > > pod
> > > > > > > > > > (removing
> > > > > > > > > > all pod
> > > > > > > > > > resources, including that pod's sling directory) in
> > > > > > > > > > response?
> > > > > > > > > > Presumably,
> > > > > > > > > > k8s would then start Sling Instance C, which is a
> > > > > > > > > > fresh
> > > > > > > > > > instance
> > > > > > > > > > with
> > > > > > > > > > no
> > > > > > > > > > sling directory. Are we saying we expect C to have
> > > > > > > > > > a
> > > > > > > > > > 50/50
> > > > > > > > > > chance
> > > > > > > > > > of
> > > > > > > > > > starting successfully? Or have we observed
> > > > > > > > > > different
> > > > > > > > > > behavior?
> > > > > > > > > 
> > > > > > > > > I think that only the first instance starts
> > > > > > > > > successfully.
> > > > > > > > > Additional
> > > > > > > > > instances will not start unless they have a Sling
> > > > > > > > > directory
> > > > > > > > > set up.
> > > > > > > > > 
> > > > > > > > > I've tested with a third instance, once two instances
> > > > > > > > > are
> > > > > > > > > up,
> > > > > > > > > and
> > > > > > > > > it
> > > > > > > > > has the exact same behaviour.
> > > > > > > > > 
> > > > > > > > > One workaround that I can suggest for a containerized
> > > > > > > > > environment
> > > > > > > > > is to
> > > > > > > > > use a supervisor script that detects the abnormal
> > > > > > > > > startup
> > > > > > > > > problem
> > > > > > > > > and
> > > > > > > > > restarts Sling, so that it starts up successfully.
> > > > > > > > > 
> > > > > > > > > Another would be to persist the 'sling' directory as
> > > > > > > > > a
> > > > > > > > > per-
> > > > > > > > > container
> > > > > > > > > volume. Not sure how easy that is with k8s, but maybe
> > > > > > > > > you
> > > > > > > > > can
> > > > > > > > > use a
> > > > > > > > > single ReadWriteMany volume at /sling, and each pod
> > > > > > > > > gets
> > > > > > > > > their own
> > > > > > > > > ${sling.home} at /sling/${containerId} ( assuming
> > > > > > > > > that is
> > > > > > > > > exposed
> > > > > > > > > through the downward API).
> > > > > > > > > 
> > > > > > > > > As these are workardounds, I would still very much
> > > > > > > > > like
> > > > > > > > > to
> > > > > > > > > see this
> > > > > > > > > fixed properly, so please file a bug to track this.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Robert
> > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > Ben
> > > > > > > > > > 
> > > > > > > > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <
> > > > > > > > > > camunoz@redhat.com
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > Thanks for the information Robert.
> > > > > > > > > > > 
> > > > > > > > > > > To replicate the issue all I needed was a mongodb
> > > > > > > > > > > (I
> > > > > > > > > > > used
> > > > > > > > > > > a
> > > > > > > > > > > full
> > > > > > > > > > > replica
> > > > > > > > > > > set, see my instructions in a previous email
> > > > > > > > > > > about
> > > > > > > > > > > how to
> > > > > > > > > > > get
> > > > > > > > > > > one
> > > > > > > > > > > going
> > > > > > > > > > > using podman) and a single process running sling.
> > > > > > > > > > > 
> > > > > > > > > > > The problem does happen when I do the following:
> > > > > > > > > > > 
> > > > > > > > > > > 2. Start Sling instance A, wait for it to start
> > > > > > > > > > > 3. Stop Sling instance A, wait for it to stop
> > > > > > > > > > > 4. Start Sling instance B - Error
> > > > > > > > > > > 
> > > > > > > > > > > but let me add more
> > > > > > > > > > > 
> > > > > > > > > > > 5. Start Sling Instance A again - Success (note I
> > > > > > > > > > > didn't
> > > > > > > > > > > remove
> > > > > > > > > > > the
> > > > > > > > > > > sling
> > > > > > > > > > > dir)
> > > > > > > > > > > 6. Start Sling instance B again - Success (note I
> > > > > > > > > > > didn't
> > > > > > > > > > > remove
> > > > > > > > > > > the
> > > > > > > > > > > sling
> > > > > > > > > > > dir)
> > > > > > > > > > > 
> > > > > > > > > > > this means that even if Sling recreates the sling
> > > > > > > > > > > directory and
> > > > > > > > > > > fails the
> > > > > > > > > > > startup, next time it will succeed. Unfortunately
> > > > > > > > > > > we
> > > > > > > > > > > don't have
> > > > > > > > > > > that luxury
> > > > > > > > > > > in containers because the sling directory is not
> > > > > > > > > > > persisted.
> > > > > > > > > > > 
> > > > > > > > > > > I think this is a bug, but I'll keep playing with
> > > > > > > > > > > it
> > > > > > > > > > > a
> > > > > > > > > > > bit to
> > > > > > > > > > > see
> > > > > > > > > > > if I can
> > > > > > > > > > > find out more.
> > > > > > > > > > > 
> > > > > > > > > > > Carlos
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > > > > > > > > rombert@apache.org
> > > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Robert I managed to replicate the issue in a
> > > > > > > > > > > > > local,
> > > > > > > > > > > > > non-
> > > > > > > > > > > > > containerized
> > > > > > > > > > > > > environment (!!!).
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The problem seems to be when the database is
> > > > > > > > > > > > > kept
> > > > > > > > > > > > > but
> > > > > > > > > > > > > the
> > > > > > > > > > > > > 'sling'
> > > > > > > > > > > > > directory
> > > > > > > > > > > > > is cleared out across restarts (as it is for
> > > > > > > > > > > > > us
> > > > > > > > > > > > > when
> > > > > > > > > > > > > the
> > > > > > > > > > > > > container
> > > > > > > > > > > > > goes
> > > > > > > > > > > > > away). As I said before this doesn't seem to
> > > > > > > > > > > > > be a
> > > > > > > > > > > > > problem
> > > > > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > Sling 11
> > > > > > > > > > > > > bundles.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The first basic solution will be to persist
> > > > > > > > > > > > > the
> > > > > > > > > > > > > 'sling'
> > > > > > > > > > > > > directory
> > > > > > > > > > > > > across
> > > > > > > > > > > > > restarts, and I was wondering if this is a
> > > > > > > > > > > > > bug,
> > > > > > > > > > > > > or as
> > > > > > > > > > > > > designed.
> > > > > > > > > > > > 
> > > > > > > > > > > > I think this should work.
> > > > > > > > > > > > 
> > > > > > > > > > > > > I also wonder if once persisted, multiple
> > > > > > > > > > > > > containers
> > > > > > > > > > > > > could
> > > > > > > > > > > > > share this
> > > > > > > > > > > > > directory.
> > > > > > > > > > > > 
> > > > > > > > > > > > This directory can't be shared, as it holds
> > > > > > > > > > > > runtime
> > > > > > > > > > > > data
> > > > > > > > > > > > related
> > > > > > > > > > > > to
> > > > > > > > > > > > Sling. For instance, a bundle that is started
> > > > > > > > > > > > in
> > > > > > > > > > > > instance A
> > > > > > > > > > > > could
> > > > > > > > > > > > be
> > > > > > > > > > > > starting on instance B.
> > > > > > > > > > > > 
> > > > > > > > > > > > There is at least one file ( sling.id ) that
> > > > > > > > > > > > holds
> > > > > > > > > > > > data
> > > > > > > > > > > > that
> > > > > > > > > > > > must
> > > > > > > > > > > > not
> > > > > > > > > > > > be the same between instances.
> > > > > > > > > > > > 
> > > > > > > > > > > > So I would advise as marking the directory as
> > > > > > > > > > > > container-
> > > > > > > > > > > > private
> > > > > > > > > > > > as a
> > > > > > > > > > > > first step.
> > > > > > > > > > > > 
> > > > > > > > > > > > Robert
> > > > > > > > > > > > 
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Carlos
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz
> > > > > > > > > > > > > <
> > > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thanks Robert (and once again I can't
> > > > > > > > > > > > > > stress
> > > > > > > > > > > > > > enough
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > grateful I
> > > > > > > > > > > > > > am for
> > > > > > > > > > > > > > all your help).
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Right now we deploy our container with the
> > > > > > > > > > > > > > expectation
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > mongo db
> > > > > > > > > > > > > > is the only necessary state we need to
> > > > > > > > > > > > > > keep;
> > > > > > > > > > > > > > everything
> > > > > > > > > > > > > > else
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > throwaway.
> > > > > > > > > > > > > > This means that a totally new container
> > > > > > > > > > > > > > connected
> > > > > > > > > > > > > > to the
> > > > > > > > > > > > > > mongodb
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > pick up the state and run the same as the
> > > > > > > > > > > > > > first
> > > > > > > > > > > > > > time it
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > fired
> > > > > > > > > > > > > > up. Do
> > > > > > > > > > > > > > you think this is an incorrect assumption?
> > > > > > > > > > > > > > If
> > > > > > > > > > > > > > so,
> > > > > > > > > > > > > > what
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > pieces of
> > > > > > > > > > > > > > state we should be keeping for subsequent
> > > > > > > > > > > > > > restarts?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This assumption has worked well for us with
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > current
> > > > > > > > > > > > > > sling
> > > > > > > > > > > > > > 11
> > > > > > > > > > > > > > release,
> > > > > > > > > > > > > > but it seems to break with the more up-to-
> > > > > > > > > > > > > > date
> > > > > > > > > > > > > > bundles.
> > > > > > > > > > > > > > Perhaps
> > > > > > > > > > > > > > running
> > > > > > > > > > > > > > Sling in a container is just not meant to
> > > > > > > > > > > > > > be.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Carlos
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert
> > > > > > > > > > > > > > Munteanu
> > > > > > > > > > > > > > <
> > > > > > > > > > > > > > rombert@apache.org
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Hi Carlos,
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos
> > > > > > > > > > > > > > > Munoz
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > Thanks Bertrand. How can I run Sling
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > DEBUG-level
> > > > > > > > > > > > > > > > logs for
> > > > > > > > > > > > > > > > every
> > > > > > > > > > > > > > > > bundle? I tried passing a few
> > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > > > command line
> > > > > > > > > > > > > > > > but nothing seemed to work.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Try configuring the LogManager to debug
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Robert
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Carlos
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM
> > > > > > > > > > > > > > > > Bertrand
> > > > > > > > > > > > > > > > Delacretaz <
> > > > > > > > > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM
> > > > > > > > > > > > > > > > > Carlos
> > > > > > > > > > > > > > > > > Munoz
> > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > ...Is there a reason why the Jcr
> > > > > > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > restarting?
> > > > > > > > > > > > > > > > > > And what
> > > > > > > > > > > > > > > > > > class could we start looking into
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > debug
> > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > this is
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > case?...
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > It's not uncommon to see extra
> > > > > > > > > > > > > > > > > restarts
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > OSGi
> > > > > > > > > > > > > > > > > components at
> > > > > > > > > > > > > > > > > startup,
> > > > > > > > > > > > > > > > > for various reasons.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > The simplest way to detect and log
> > > > > > > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > > > > startups
> > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > be to implement a
> > > > > > > > > > > > > > > > > SlingRepositoryInitializer
> > > > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > that's
> > > > > > > > > > > > > > > > > called
> > > > > > > > > > > > > > > > > at every startup, or use the logs of
> > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > like the
> > > > > > > > > > > > > > > > > JCR
> > > > > > > > > > > > > > > > > RepositoryInitializer [2] if that has
> > > > > > > > > > > > > > > > > anything to
> > > > > > > > > > > > > > > > > process in
> > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > -Bertrand
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > 
> > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Hi Robert,

I've found that it's not as simple. There is still some factor of
randomness attached to this issue. After doing the bisect more times, I've
found that commit 0a13d3467aa78b46ec33ae5687418685f90a9e12 seems to work
*most* of the time. There are still times where I get the error, but it is
recoverable on the next run.

Carlos

On Thu, Mar 19, 2020 at 6:21 AM Robert Munteanu <ro...@apache.org> wrote:

> That's good info, thank you! I've added some details to the Jira issue.
> I tried reverting the commits I suspect are at fault
>
> - https://github.com/apache/sling-org-apache-sling-jcr-base/commit/6f5771a
> - https://github.com/apache/sling-org-apache-sling-jcr-base/commit/3de2b9f
>
> But that failed due to conflicts. I will try and manually remove the
> changes and see what that does.
> Robert
>
> On Wed, 2020-03-18 at 21:24 -0400, Carlos Munoz wrote:
> > I went through the bisect process and I got the first bad commit:
> >
> > commit bb1e10d97f3c163fb87917ea782afff674050891
> > Author: Eric Norman <en...@apache.org>
> > Date:   Sun Dec 16 12:33:08 2018 -0800
> >
> >     switch to released JCR Base 3.0.6
> >
> > (I tried it a couple of times just to be sure)
> >
> > I tried running our app with the commit before that and I get it to
> > run.
> > (There are other unrelated problems).
> >
> >
> > On Mon, Mar 16, 2020 at 6:12 PM Robert Munteanu <ro...@apache.org>
> > wrote:
> >
> > > Hi Carlos,
> > >
> > > Apologies for the delay ...
> > >
> > > What I was thinking of doing myself, but did not have the time is
> > > the
> > > following
> > >
> > > 1. Find a version of Sling for which the scenario in SLING-9118
> > > works.
> > > Perhaps Sling Starter 11 is a good start.
> > > 2. Run a `git bisect` check between sling starter 11 and the
> > > current
> > > master branch
> > >
> > > Assuming my guess is correct, git would say
> > >
> > > Bisecting: 36 revisions left to test after this (roughly 5 steps)
> > > [c1aedf7b292f7835ceb4e2f56fedcb3294c60756] Update to Tika 1.21
> > >
> > > So not that many steps to test.
> > >
> > > If you would manage to isolate the change to the starter that broke
> > > this, it would make it much easier to understand where the problem
> > > is
> > > coming from.
> > >
> > > Thanks!
> > > Robert
> > >
> > > On Mon, 2020-03-16 at 16:27 -0400, Carlos Munoz wrote:
> > > > Hi Robert,
> > > >
> > > > Just a friendly ping about this issue :)
> > > >
> > > > We could try to submit a fix with some potential guidance from
> > > > you.
> > > > For
> > > > example, which of the many Sling bundles should we start looking
> > > > at?
> > > >
> > > > Regards,
> > > >
> > > > Carlos
> > > >
> > > >
> > > > On Wed, Feb 26, 2020 at 7:24 AM Carlos Munoz <ca...@redhat.com>
> > > > wrote:
> > > >
> > > > > Thanks Robert. As always your help is appreciated.
> > > > >
> > > > > On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <
> > > > > rombert@apache.org
> > > > > wrote:
> > > > >
> > > > > > Thanks, Ben,
> > > > > >
> > > > > > I added a bit more detail, based on our mailing list
> > > > > > conversations.
> > > > > > I'll have limited access in the next two weeks, but if no one
> > > > > > picks it
> > > > > > up I'll look into it when I get back.
> > > > > >
> > > > > > Thanks,
> > > > > > Robert
> > > > > >
> > > > > > On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> > > > > > > I went ahead and created
> > > > > > > https://issues.apache.org/jira/browse/SLING-9118
> > > > > > > for this. Although the ultimate goal here is
> > > > > > > containerization,
> > > > > > > I
> > > > > > > neglected
> > > > > > > to include any details to that effect in the ticket, since
> > > > > > > the
> > > > > > > behavior is
> > > > > > > reproducible without that being a complicating factor.
> > > > > > >
> > > > > > > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <
> > > > > > > rombert@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > > > > > > > I am following along conceptually - I want to make sure
> > > > > > > > > I
> > > > > > > > > understand
> > > > > > > > > what's
> > > > > > > > > being described.
> > > > > > > > >
> > > > > > > > > Let's say Sling Instance A starts successfully the
> > > > > > > > > first
> > > > > > > > > time. If
> > > > > > > > > we
> > > > > > > > > restart Sling Instance A, we expect subsequent restarts
> > > > > > > > > to
> > > > > > > > > also
> > > > > > > > > succeed,
> > > > > > > > > without removing the sling directory.
> > > > > > > > > Now let's say Sling Instance B does NOT start
> > > > > > > > > successfully
> > > > > > > > > the
> > > > > > > > > first
> > > > > > > > > time.
> > > > > > > > > Despite that, we expect subsequent restarts to succeed
> > > > > > > > > without
> > > > > > > > > removing the
> > > > > > > > > sling directory.
> > > > > > > > >
> > > > > > > > > Correct so far?
> > > > > > > >
> > > > > > > > Yes, correct.
> > > > > > > >
> > > > > > > > > Assuming yes... what if this is running in k8s, and k8s
> > > > > > > > > sees that
> > > > > > > > > Sling
> > > > > > > > > Instance B did not start successfully, and kills the
> > > > > > > > > pod
> > > > > > > > > (removing
> > > > > > > > > all pod
> > > > > > > > > resources, including that pod's sling directory) in
> > > > > > > > > response?
> > > > > > > > > Presumably,
> > > > > > > > > k8s would then start Sling Instance C, which is a fresh
> > > > > > > > > instance
> > > > > > > > > with
> > > > > > > > > no
> > > > > > > > > sling directory. Are we saying we expect C to have a
> > > > > > > > > 50/50
> > > > > > > > > chance
> > > > > > > > > of
> > > > > > > > > starting successfully? Or have we observed different
> > > > > > > > > behavior?
> > > > > > > >
> > > > > > > > I think that only the first instance starts successfully.
> > > > > > > > Additional
> > > > > > > > instances will not start unless they have a Sling
> > > > > > > > directory
> > > > > > > > set up.
> > > > > > > >
> > > > > > > > I've tested with a third instance, once two instances are
> > > > > > > > up,
> > > > > > > > and
> > > > > > > > it
> > > > > > > > has the exact same behaviour.
> > > > > > > >
> > > > > > > > One workaround that I can suggest for a containerized
> > > > > > > > environment
> > > > > > > > is to
> > > > > > > > use a supervisor script that detects the abnormal startup
> > > > > > > > problem
> > > > > > > > and
> > > > > > > > restarts Sling, so that it starts up successfully.
> > > > > > > >
> > > > > > > > Another would be to persist the 'sling' directory as a
> > > > > > > > per-
> > > > > > > > container
> > > > > > > > volume. Not sure how easy that is with k8s, but maybe you
> > > > > > > > can
> > > > > > > > use a
> > > > > > > > single ReadWriteMany volume at /sling, and each pod gets
> > > > > > > > their own
> > > > > > > > ${sling.home} at /sling/${containerId} ( assuming that is
> > > > > > > > exposed
> > > > > > > > through the downward API).
> > > > > > > >
> > > > > > > > As these are workardounds, I would still very much like
> > > > > > > > to
> > > > > > > > see this
> > > > > > > > fixed properly, so please file a bug to track this.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Robert
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Ben
> > > > > > > > >
> > > > > > > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <
> > > > > > > > > camunoz@redhat.com
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the information Robert.
> > > > > > > > > >
> > > > > > > > > > To replicate the issue all I needed was a mongodb (I
> > > > > > > > > > used
> > > > > > > > > > a
> > > > > > > > > > full
> > > > > > > > > > replica
> > > > > > > > > > set, see my instructions in a previous email about
> > > > > > > > > > how to
> > > > > > > > > > get
> > > > > > > > > > one
> > > > > > > > > > going
> > > > > > > > > > using podman) and a single process running sling.
> > > > > > > > > >
> > > > > > > > > > The problem does happen when I do the following:
> > > > > > > > > >
> > > > > > > > > > 2. Start Sling instance A, wait for it to start
> > > > > > > > > > 3. Stop Sling instance A, wait for it to stop
> > > > > > > > > > 4. Start Sling instance B - Error
> > > > > > > > > >
> > > > > > > > > > but let me add more
> > > > > > > > > >
> > > > > > > > > > 5. Start Sling Instance A again - Success (note I
> > > > > > > > > > didn't
> > > > > > > > > > remove
> > > > > > > > > > the
> > > > > > > > > > sling
> > > > > > > > > > dir)
> > > > > > > > > > 6. Start Sling instance B again - Success (note I
> > > > > > > > > > didn't
> > > > > > > > > > remove
> > > > > > > > > > the
> > > > > > > > > > sling
> > > > > > > > > > dir)
> > > > > > > > > >
> > > > > > > > > > this means that even if Sling recreates the sling
> > > > > > > > > > directory and
> > > > > > > > > > fails the
> > > > > > > > > > startup, next time it will succeed. Unfortunately we
> > > > > > > > > > don't have
> > > > > > > > > > that luxury
> > > > > > > > > > in containers because the sling directory is not
> > > > > > > > > > persisted.
> > > > > > > > > >
> > > > > > > > > > I think this is a bug, but I'll keep playing with it
> > > > > > > > > > a
> > > > > > > > > > bit to
> > > > > > > > > > see
> > > > > > > > > > if I can
> > > > > > > > > > find out more.
> > > > > > > > > >
> > > > > > > > > > Carlos
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > > > > > > > rombert@apache.org
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Robert I managed to replicate the issue in a
> > > > > > > > > > > > local,
> > > > > > > > > > > > non-
> > > > > > > > > > > > containerized
> > > > > > > > > > > > environment (!!!).
> > > > > > > > > > > >
> > > > > > > > > > > > The problem seems to be when the database is kept
> > > > > > > > > > > > but
> > > > > > > > > > > > the
> > > > > > > > > > > > 'sling'
> > > > > > > > > > > > directory
> > > > > > > > > > > > is cleared out across restarts (as it is for us
> > > > > > > > > > > > when
> > > > > > > > > > > > the
> > > > > > > > > > > > container
> > > > > > > > > > > > goes
> > > > > > > > > > > > away). As I said before this doesn't seem to be a
> > > > > > > > > > > > problem
> > > > > > > > > > > > with
> > > > > > > > > > > > the
> > > > > > > > > > > > Sling 11
> > > > > > > > > > > > bundles.
> > > > > > > > > > > >
> > > > > > > > > > > > The first basic solution will be to persist the
> > > > > > > > > > > > 'sling'
> > > > > > > > > > > > directory
> > > > > > > > > > > > across
> > > > > > > > > > > > restarts, and I was wondering if this is a bug,
> > > > > > > > > > > > or as
> > > > > > > > > > > > designed.
> > > > > > > > > > >
> > > > > > > > > > > I think this should work.
> > > > > > > > > > >
> > > > > > > > > > > > I also wonder if once persisted, multiple
> > > > > > > > > > > > containers
> > > > > > > > > > > > could
> > > > > > > > > > > > share this
> > > > > > > > > > > > directory.
> > > > > > > > > > >
> > > > > > > > > > > This directory can't be shared, as it holds runtime
> > > > > > > > > > > data
> > > > > > > > > > > related
> > > > > > > > > > > to
> > > > > > > > > > > Sling. For instance, a bundle that is started in
> > > > > > > > > > > instance A
> > > > > > > > > > > could
> > > > > > > > > > > be
> > > > > > > > > > > starting on instance B.
> > > > > > > > > > >
> > > > > > > > > > > There is at least one file ( sling.id ) that holds
> > > > > > > > > > > data
> > > > > > > > > > > that
> > > > > > > > > > > must
> > > > > > > > > > > not
> > > > > > > > > > > be the same between instances.
> > > > > > > > > > >
> > > > > > > > > > > So I would advise as marking the directory as
> > > > > > > > > > > container-
> > > > > > > > > > > private
> > > > > > > > > > > as a
> > > > > > > > > > > first step.
> > > > > > > > > > >
> > > > > > > > > > > Robert
> > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Carlos
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks Robert (and once again I can't stress
> > > > > > > > > > > > > enough
> > > > > > > > > > > > > how
> > > > > > > > > > > > > grateful I
> > > > > > > > > > > > > am for
> > > > > > > > > > > > > all your help).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Right now we deploy our container with the
> > > > > > > > > > > > > expectation
> > > > > > > > > > > > > that
> > > > > > > > > > > > > the
> > > > > > > > > > > > > mongo db
> > > > > > > > > > > > > is the only necessary state we need to keep;
> > > > > > > > > > > > > everything
> > > > > > > > > > > > > else
> > > > > > > > > > > > > is
> > > > > > > > > > > > > throwaway.
> > > > > > > > > > > > > This means that a totally new container
> > > > > > > > > > > > > connected
> > > > > > > > > > > > > to the
> > > > > > > > > > > > > mongodb
> > > > > > > > > > > > > should
> > > > > > > > > > > > > pick up the state and run the same as the first
> > > > > > > > > > > > > time it
> > > > > > > > > > > > > was
> > > > > > > > > > > > > fired
> > > > > > > > > > > > > up. Do
> > > > > > > > > > > > > you think this is an incorrect assumption? If
> > > > > > > > > > > > > so,
> > > > > > > > > > > > > what
> > > > > > > > > > > > > are
> > > > > > > > > > > > > other
> > > > > > > > > > > > > pieces of
> > > > > > > > > > > > > state we should be keeping for subsequent
> > > > > > > > > > > > > restarts?
> > > > > > > > > > > > >
> > > > > > > > > > > > > This assumption has worked well for us with the
> > > > > > > > > > > > > current
> > > > > > > > > > > > > sling
> > > > > > > > > > > > > 11
> > > > > > > > > > > > > release,
> > > > > > > > > > > > > but it seems to break with the more up-to-date
> > > > > > > > > > > > > bundles.
> > > > > > > > > > > > > Perhaps
> > > > > > > > > > > > > running
> > > > > > > > > > > > > Sling in a container is just not meant to be.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Carlos
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu
> > > > > > > > > > > > > <
> > > > > > > > > > > > > rombert@apache.org
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Carlos,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos
> > > > > > > > > > > > > > Munoz
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > Thanks Bertrand. How can I run Sling with
> > > > > > > > > > > > > > > DEBUG-level
> > > > > > > > > > > > > > > logs for
> > > > > > > > > > > > > > > every
> > > > > > > > > > > > > > > bundle? I tried passing a few configuration
> > > > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > > command line
> > > > > > > > > > > > > > > but nothing seemed to work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Try configuring the LogManager to debug at
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Robert
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Carlos
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand
> > > > > > > > > > > > > > > Delacretaz <
> > > > > > > > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos
> > > > > > > > > > > > > > > > Munoz
> > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > ...Is there a reason why the Jcr
> > > > > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > restarting?
> > > > > > > > > > > > > > > > > And what
> > > > > > > > > > > > > > > > > class could we start looking into to
> > > > > > > > > > > > > > > > > debug
> > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > this is
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > case?...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It's not uncommon to see extra restarts
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > OSGi
> > > > > > > > > > > > > > > > components at
> > > > > > > > > > > > > > > > startup,
> > > > > > > > > > > > > > > > for various reasons.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The simplest way to detect and log
> > > > > > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > > > startups
> > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > be to implement a
> > > > > > > > > > > > > > > > SlingRepositoryInitializer
> > > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > that's
> > > > > > > > > > > > > > > > called
> > > > > > > > > > > > > > > > at every startup, or use the logs of an
> > > > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > like the
> > > > > > > > > > > > > > > > JCR
> > > > > > > > > > > > > > > > RepositoryInitializer [2] if that has
> > > > > > > > > > > > > > > > anything to
> > > > > > > > > > > > > > > > process in
> > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -Bertrand
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > >
> > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > >
> > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
That's good info, thank you! I've added some details to the Jira issue.
I tried reverting the commits I suspect are at fault

- https://github.com/apache/sling-org-apache-sling-jcr-base/commit/6f5771a
- https://github.com/apache/sling-org-apache-sling-jcr-base/commit/3de2b9f

But that failed due to conflicts. I will try and manually remove the
changes and see what that does.
Robert

On Wed, 2020-03-18 at 21:24 -0400, Carlos Munoz wrote:
> I went through the bisect process and I got the first bad commit:
> 
> commit bb1e10d97f3c163fb87917ea782afff674050891
> Author: Eric Norman <en...@apache.org>
> Date:   Sun Dec 16 12:33:08 2018 -0800
> 
>     switch to released JCR Base 3.0.6
> 
> (I tried it a couple of times just to be sure)
> 
> I tried running our app with the commit before that and I get it to
> run.
> (There are other unrelated problems).
> 
> 
> On Mon, Mar 16, 2020 at 6:12 PM Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > Hi Carlos,
> > 
> > Apologies for the delay ...
> > 
> > What I was thinking of doing myself, but did not have the time is
> > the
> > following
> > 
> > 1. Find a version of Sling for which the scenario in SLING-9118
> > works.
> > Perhaps Sling Starter 11 is a good start.
> > 2. Run a `git bisect` check between sling starter 11 and the
> > current
> > master branch
> > 
> > Assuming my guess is correct, git would say
> > 
> > Bisecting: 36 revisions left to test after this (roughly 5 steps)
> > [c1aedf7b292f7835ceb4e2f56fedcb3294c60756] Update to Tika 1.21
> > 
> > So not that many steps to test.
> > 
> > If you would manage to isolate the change to the starter that broke
> > this, it would make it much easier to understand where the problem
> > is
> > coming from.
> > 
> > Thanks!
> > Robert
> > 
> > On Mon, 2020-03-16 at 16:27 -0400, Carlos Munoz wrote:
> > > Hi Robert,
> > > 
> > > Just a friendly ping about this issue :)
> > > 
> > > We could try to submit a fix with some potential guidance from
> > > you.
> > > For
> > > example, which of the many Sling bundles should we start looking
> > > at?
> > > 
> > > Regards,
> > > 
> > > Carlos
> > > 
> > > 
> > > On Wed, Feb 26, 2020 at 7:24 AM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > > 
> > > > Thanks Robert. As always your help is appreciated.
> > > > 
> > > > On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <
> > > > rombert@apache.org
> > > > wrote:
> > > > 
> > > > > Thanks, Ben,
> > > > > 
> > > > > I added a bit more detail, based on our mailing list
> > > > > conversations.
> > > > > I'll have limited access in the next two weeks, but if no one
> > > > > picks it
> > > > > up I'll look into it when I get back.
> > > > > 
> > > > > Thanks,
> > > > > Robert
> > > > > 
> > > > > On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> > > > > > I went ahead and created
> > > > > > https://issues.apache.org/jira/browse/SLING-9118
> > > > > > for this. Although the ultimate goal here is
> > > > > > containerization,
> > > > > > I
> > > > > > neglected
> > > > > > to include any details to that effect in the ticket, since
> > > > > > the
> > > > > > behavior is
> > > > > > reproducible without that being a complicating factor.
> > > > > > 
> > > > > > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <
> > > > > > rombert@apache.org>
> > > > > > wrote:
> > > > > > 
> > > > > > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > > > > > > I am following along conceptually - I want to make sure
> > > > > > > > I
> > > > > > > > understand
> > > > > > > > what's
> > > > > > > > being described.
> > > > > > > > 
> > > > > > > > Let's say Sling Instance A starts successfully the
> > > > > > > > first
> > > > > > > > time. If
> > > > > > > > we
> > > > > > > > restart Sling Instance A, we expect subsequent restarts
> > > > > > > > to
> > > > > > > > also
> > > > > > > > succeed,
> > > > > > > > without removing the sling directory.
> > > > > > > > Now let's say Sling Instance B does NOT start
> > > > > > > > successfully
> > > > > > > > the
> > > > > > > > first
> > > > > > > > time.
> > > > > > > > Despite that, we expect subsequent restarts to succeed
> > > > > > > > without
> > > > > > > > removing the
> > > > > > > > sling directory.
> > > > > > > > 
> > > > > > > > Correct so far?
> > > > > > > 
> > > > > > > Yes, correct.
> > > > > > > 
> > > > > > > > Assuming yes... what if this is running in k8s, and k8s
> > > > > > > > sees that
> > > > > > > > Sling
> > > > > > > > Instance B did not start successfully, and kills the
> > > > > > > > pod
> > > > > > > > (removing
> > > > > > > > all pod
> > > > > > > > resources, including that pod's sling directory) in
> > > > > > > > response?
> > > > > > > > Presumably,
> > > > > > > > k8s would then start Sling Instance C, which is a fresh
> > > > > > > > instance
> > > > > > > > with
> > > > > > > > no
> > > > > > > > sling directory. Are we saying we expect C to have a
> > > > > > > > 50/50
> > > > > > > > chance
> > > > > > > > of
> > > > > > > > starting successfully? Or have we observed different
> > > > > > > > behavior?
> > > > > > > 
> > > > > > > I think that only the first instance starts successfully.
> > > > > > > Additional
> > > > > > > instances will not start unless they have a Sling
> > > > > > > directory
> > > > > > > set up.
> > > > > > > 
> > > > > > > I've tested with a third instance, once two instances are
> > > > > > > up,
> > > > > > > and
> > > > > > > it
> > > > > > > has the exact same behaviour.
> > > > > > > 
> > > > > > > One workaround that I can suggest for a containerized
> > > > > > > environment
> > > > > > > is to
> > > > > > > use a supervisor script that detects the abnormal startup
> > > > > > > problem
> > > > > > > and
> > > > > > > restarts Sling, so that it starts up successfully.
> > > > > > > 
> > > > > > > Another would be to persist the 'sling' directory as a
> > > > > > > per-
> > > > > > > container
> > > > > > > volume. Not sure how easy that is with k8s, but maybe you
> > > > > > > can
> > > > > > > use a
> > > > > > > single ReadWriteMany volume at /sling, and each pod gets
> > > > > > > their own
> > > > > > > ${sling.home} at /sling/${containerId} ( assuming that is
> > > > > > > exposed
> > > > > > > through the downward API).
> > > > > > > 
> > > > > > > As these are workardounds, I would still very much like
> > > > > > > to
> > > > > > > see this
> > > > > > > fixed properly, so please file a bug to track this.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Robert
> > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Ben
> > > > > > > > 
> > > > > > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <
> > > > > > > > camunoz@redhat.com
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > Thanks for the information Robert.
> > > > > > > > > 
> > > > > > > > > To replicate the issue all I needed was a mongodb (I
> > > > > > > > > used
> > > > > > > > > a
> > > > > > > > > full
> > > > > > > > > replica
> > > > > > > > > set, see my instructions in a previous email about
> > > > > > > > > how to
> > > > > > > > > get
> > > > > > > > > one
> > > > > > > > > going
> > > > > > > > > using podman) and a single process running sling.
> > > > > > > > > 
> > > > > > > > > The problem does happen when I do the following:
> > > > > > > > > 
> > > > > > > > > 2. Start Sling instance A, wait for it to start
> > > > > > > > > 3. Stop Sling instance A, wait for it to stop
> > > > > > > > > 4. Start Sling instance B - Error
> > > > > > > > > 
> > > > > > > > > but let me add more
> > > > > > > > > 
> > > > > > > > > 5. Start Sling Instance A again - Success (note I
> > > > > > > > > didn't
> > > > > > > > > remove
> > > > > > > > > the
> > > > > > > > > sling
> > > > > > > > > dir)
> > > > > > > > > 6. Start Sling instance B again - Success (note I
> > > > > > > > > didn't
> > > > > > > > > remove
> > > > > > > > > the
> > > > > > > > > sling
> > > > > > > > > dir)
> > > > > > > > > 
> > > > > > > > > this means that even if Sling recreates the sling
> > > > > > > > > directory and
> > > > > > > > > fails the
> > > > > > > > > startup, next time it will succeed. Unfortunately we
> > > > > > > > > don't have
> > > > > > > > > that luxury
> > > > > > > > > in containers because the sling directory is not
> > > > > > > > > persisted.
> > > > > > > > > 
> > > > > > > > > I think this is a bug, but I'll keep playing with it
> > > > > > > > > a
> > > > > > > > > bit to
> > > > > > > > > see
> > > > > > > > > if I can
> > > > > > > > > find out more.
> > > > > > > > > 
> > > > > > > > > Carlos
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > > > > > > rombert@apache.org
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz
> > > > > > > > > > wrote:
> > > > > > > > > > > Robert I managed to replicate the issue in a
> > > > > > > > > > > local,
> > > > > > > > > > > non-
> > > > > > > > > > > containerized
> > > > > > > > > > > environment (!!!).
> > > > > > > > > > > 
> > > > > > > > > > > The problem seems to be when the database is kept
> > > > > > > > > > > but
> > > > > > > > > > > the
> > > > > > > > > > > 'sling'
> > > > > > > > > > > directory
> > > > > > > > > > > is cleared out across restarts (as it is for us
> > > > > > > > > > > when
> > > > > > > > > > > the
> > > > > > > > > > > container
> > > > > > > > > > > goes
> > > > > > > > > > > away). As I said before this doesn't seem to be a
> > > > > > > > > > > problem
> > > > > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > Sling 11
> > > > > > > > > > > bundles.
> > > > > > > > > > > 
> > > > > > > > > > > The first basic solution will be to persist the
> > > > > > > > > > > 'sling'
> > > > > > > > > > > directory
> > > > > > > > > > > across
> > > > > > > > > > > restarts, and I was wondering if this is a bug,
> > > > > > > > > > > or as
> > > > > > > > > > > designed.
> > > > > > > > > > 
> > > > > > > > > > I think this should work.
> > > > > > > > > > 
> > > > > > > > > > > I also wonder if once persisted, multiple
> > > > > > > > > > > containers
> > > > > > > > > > > could
> > > > > > > > > > > share this
> > > > > > > > > > > directory.
> > > > > > > > > > 
> > > > > > > > > > This directory can't be shared, as it holds runtime
> > > > > > > > > > data
> > > > > > > > > > related
> > > > > > > > > > to
> > > > > > > > > > Sling. For instance, a bundle that is started in
> > > > > > > > > > instance A
> > > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > starting on instance B.
> > > > > > > > > > 
> > > > > > > > > > There is at least one file ( sling.id ) that holds
> > > > > > > > > > data
> > > > > > > > > > that
> > > > > > > > > > must
> > > > > > > > > > not
> > > > > > > > > > be the same between instances.
> > > > > > > > > > 
> > > > > > > > > > So I would advise as marking the directory as
> > > > > > > > > > container-
> > > > > > > > > > private
> > > > > > > > > > as a
> > > > > > > > > > first step.
> > > > > > > > > > 
> > > > > > > > > > Robert
> > > > > > > > > > 
> > > > > > > > > > > Regards,
> > > > > > > > > > > 
> > > > > > > > > > > Carlos
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > Thanks Robert (and once again I can't stress
> > > > > > > > > > > > enough
> > > > > > > > > > > > how
> > > > > > > > > > > > grateful I
> > > > > > > > > > > > am for
> > > > > > > > > > > > all your help).
> > > > > > > > > > > > 
> > > > > > > > > > > > Right now we deploy our container with the
> > > > > > > > > > > > expectation
> > > > > > > > > > > > that
> > > > > > > > > > > > the
> > > > > > > > > > > > mongo db
> > > > > > > > > > > > is the only necessary state we need to keep;
> > > > > > > > > > > > everything
> > > > > > > > > > > > else
> > > > > > > > > > > > is
> > > > > > > > > > > > throwaway.
> > > > > > > > > > > > This means that a totally new container
> > > > > > > > > > > > connected
> > > > > > > > > > > > to the
> > > > > > > > > > > > mongodb
> > > > > > > > > > > > should
> > > > > > > > > > > > pick up the state and run the same as the first
> > > > > > > > > > > > time it
> > > > > > > > > > > > was
> > > > > > > > > > > > fired
> > > > > > > > > > > > up. Do
> > > > > > > > > > > > you think this is an incorrect assumption? If
> > > > > > > > > > > > so,
> > > > > > > > > > > > what
> > > > > > > > > > > > are
> > > > > > > > > > > > other
> > > > > > > > > > > > pieces of
> > > > > > > > > > > > state we should be keeping for subsequent
> > > > > > > > > > > > restarts?
> > > > > > > > > > > > 
> > > > > > > > > > > > This assumption has worked well for us with the
> > > > > > > > > > > > current
> > > > > > > > > > > > sling
> > > > > > > > > > > > 11
> > > > > > > > > > > > release,
> > > > > > > > > > > > but it seems to break with the more up-to-date
> > > > > > > > > > > > bundles.
> > > > > > > > > > > > Perhaps
> > > > > > > > > > > > running
> > > > > > > > > > > > Sling in a container is just not meant to be.
> > > > > > > > > > > > 
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > 
> > > > > > > > > > > > Carlos
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu
> > > > > > > > > > > > <
> > > > > > > > > > > > rombert@apache.org
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > Hi Carlos,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos
> > > > > > > > > > > > > Munoz
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Thanks Bertrand. How can I run Sling with
> > > > > > > > > > > > > > DEBUG-level
> > > > > > > > > > > > > > logs for
> > > > > > > > > > > > > > every
> > > > > > > > > > > > > > bundle? I tried passing a few configuration
> > > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > from the
> > > > > > > > > > > > > > command line
> > > > > > > > > > > > > > but nothing seemed to work.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Try configuring the LogManager to debug at
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Robert
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Carlos
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand
> > > > > > > > > > > > > > Delacretaz <
> > > > > > > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos
> > > > > > > > > > > > > > > Munoz
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > ...Is there a reason why the Jcr
> > > > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > restarting?
> > > > > > > > > > > > > > > > And what
> > > > > > > > > > > > > > > > class could we start looking into to
> > > > > > > > > > > > > > > > debug
> > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > this is
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > case?...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > It's not uncommon to see extra restarts
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > OSGi
> > > > > > > > > > > > > > > components at
> > > > > > > > > > > > > > > startup,
> > > > > > > > > > > > > > > for various reasons.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > The simplest way to detect and log
> > > > > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > > startups
> > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > be to implement a
> > > > > > > > > > > > > > > SlingRepositoryInitializer
> > > > > > > > > > > > > > > service
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > that's
> > > > > > > > > > > > > > > called
> > > > > > > > > > > > > > > at every startup, or use the logs of an
> > > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > like the
> > > > > > > > > > > > > > > JCR
> > > > > > > > > > > > > > > RepositoryInitializer [2] if that has
> > > > > > > > > > > > > > > anything to
> > > > > > > > > > > > > > > process in
> > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > -Bertrand
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
I went through the bisect process and I got the first bad commit:

commit bb1e10d97f3c163fb87917ea782afff674050891
Author: Eric Norman <en...@apache.org>
Date:   Sun Dec 16 12:33:08 2018 -0800

    switch to released JCR Base 3.0.6

(I tried it a couple of times just to be sure)

I tried running our app with the commit before that and I get it to run.
(There are other unrelated problems).


On Mon, Mar 16, 2020 at 6:12 PM Robert Munteanu <ro...@apache.org> wrote:

> Hi Carlos,
>
> Apologies for the delay ...
>
> What I was thinking of doing myself, but did not have the time is the
> following
>
> 1. Find a version of Sling for which the scenario in SLING-9118 works.
> Perhaps Sling Starter 11 is a good start.
> 2. Run a `git bisect` check between sling starter 11 and the current
> master branch
>
> Assuming my guess is correct, git would say
>
> Bisecting: 36 revisions left to test after this (roughly 5 steps)
> [c1aedf7b292f7835ceb4e2f56fedcb3294c60756] Update to Tika 1.21
>
> So not that many steps to test.
>
> If you would manage to isolate the change to the starter that broke
> this, it would make it much easier to understand where the problem is
> coming from.
>
> Thanks!
> Robert
>
> On Mon, 2020-03-16 at 16:27 -0400, Carlos Munoz wrote:
> > Hi Robert,
> >
> > Just a friendly ping about this issue :)
> >
> > We could try to submit a fix with some potential guidance from you.
> > For
> > example, which of the many Sling bundles should we start looking at?
> >
> > Regards,
> >
> > Carlos
> >
> >
> > On Wed, Feb 26, 2020 at 7:24 AM Carlos Munoz <ca...@redhat.com>
> > wrote:
> >
> > > Thanks Robert. As always your help is appreciated.
> > >
> > > On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <rombert@apache.org
> > > >
> > > wrote:
> > >
> > > > Thanks, Ben,
> > > >
> > > > I added a bit more detail, based on our mailing list
> > > > conversations.
> > > > I'll have limited access in the next two weeks, but if no one
> > > > picks it
> > > > up I'll look into it when I get back.
> > > >
> > > > Thanks,
> > > > Robert
> > > >
> > > > On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> > > > > I went ahead and created
> > > > > https://issues.apache.org/jira/browse/SLING-9118
> > > > > for this. Although the ultimate goal here is containerization,
> > > > > I
> > > > > neglected
> > > > > to include any details to that effect in the ticket, since the
> > > > > behavior is
> > > > > reproducible without that being a complicating factor.
> > > > >
> > > > > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <
> > > > > rombert@apache.org>
> > > > > wrote:
> > > > >
> > > > > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > > > > > I am following along conceptually - I want to make sure I
> > > > > > > understand
> > > > > > > what's
> > > > > > > being described.
> > > > > > >
> > > > > > > Let's say Sling Instance A starts successfully the first
> > > > > > > time. If
> > > > > > > we
> > > > > > > restart Sling Instance A, we expect subsequent restarts to
> > > > > > > also
> > > > > > > succeed,
> > > > > > > without removing the sling directory.
> > > > > > > Now let's say Sling Instance B does NOT start successfully
> > > > > > > the
> > > > > > > first
> > > > > > > time.
> > > > > > > Despite that, we expect subsequent restarts to succeed
> > > > > > > without
> > > > > > > removing the
> > > > > > > sling directory.
> > > > > > >
> > > > > > > Correct so far?
> > > > > >
> > > > > > Yes, correct.
> > > > > >
> > > > > > > Assuming yes... what if this is running in k8s, and k8s
> > > > > > > sees that
> > > > > > > Sling
> > > > > > > Instance B did not start successfully, and kills the pod
> > > > > > > (removing
> > > > > > > all pod
> > > > > > > resources, including that pod's sling directory) in
> > > > > > > response?
> > > > > > > Presumably,
> > > > > > > k8s would then start Sling Instance C, which is a fresh
> > > > > > > instance
> > > > > > > with
> > > > > > > no
> > > > > > > sling directory. Are we saying we expect C to have a 50/50
> > > > > > > chance
> > > > > > > of
> > > > > > > starting successfully? Or have we observed different
> > > > > > > behavior?
> > > > > >
> > > > > > I think that only the first instance starts successfully.
> > > > > > Additional
> > > > > > instances will not start unless they have a Sling directory
> > > > > > set up.
> > > > > >
> > > > > > I've tested with a third instance, once two instances are up,
> > > > > > and
> > > > > > it
> > > > > > has the exact same behaviour.
> > > > > >
> > > > > > One workaround that I can suggest for a containerized
> > > > > > environment
> > > > > > is to
> > > > > > use a supervisor script that detects the abnormal startup
> > > > > > problem
> > > > > > and
> > > > > > restarts Sling, so that it starts up successfully.
> > > > > >
> > > > > > Another would be to persist the 'sling' directory as a per-
> > > > > > container
> > > > > > volume. Not sure how easy that is with k8s, but maybe you can
> > > > > > use a
> > > > > > single ReadWriteMany volume at /sling, and each pod gets
> > > > > > their own
> > > > > > ${sling.home} at /sling/${containerId} ( assuming that is
> > > > > > exposed
> > > > > > through the downward API).
> > > > > >
> > > > > > As these are workardounds, I would still very much like to
> > > > > > see this
> > > > > > fixed properly, so please file a bug to track this.
> > > > > >
> > > > > > Thanks,
> > > > > > Robert
> > > > > >
> > > > > > > Thanks,
> > > > > > > Ben
> > > > > > >
> > > > > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <
> > > > > > > camunoz@redhat.com
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the information Robert.
> > > > > > > >
> > > > > > > > To replicate the issue all I needed was a mongodb (I used
> > > > > > > > a
> > > > > > > > full
> > > > > > > > replica
> > > > > > > > set, see my instructions in a previous email about how to
> > > > > > > > get
> > > > > > > > one
> > > > > > > > going
> > > > > > > > using podman) and a single process running sling.
> > > > > > > >
> > > > > > > > The problem does happen when I do the following:
> > > > > > > >
> > > > > > > > 2. Start Sling instance A, wait for it to start
> > > > > > > > 3. Stop Sling instance A, wait for it to stop
> > > > > > > > 4. Start Sling instance B - Error
> > > > > > > >
> > > > > > > > but let me add more
> > > > > > > >
> > > > > > > > 5. Start Sling Instance A again - Success (note I didn't
> > > > > > > > remove
> > > > > > > > the
> > > > > > > > sling
> > > > > > > > dir)
> > > > > > > > 6. Start Sling instance B again - Success (note I didn't
> > > > > > > > remove
> > > > > > > > the
> > > > > > > > sling
> > > > > > > > dir)
> > > > > > > >
> > > > > > > > this means that even if Sling recreates the sling
> > > > > > > > directory and
> > > > > > > > fails the
> > > > > > > > startup, next time it will succeed. Unfortunately we
> > > > > > > > don't have
> > > > > > > > that luxury
> > > > > > > > in containers because the sling directory is not
> > > > > > > > persisted.
> > > > > > > >
> > > > > > > > I think this is a bug, but I'll keep playing with it a
> > > > > > > > bit to
> > > > > > > > see
> > > > > > > > if I can
> > > > > > > > find out more.
> > > > > > > >
> > > > > > > > Carlos
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > > > > > rombert@apache.org
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > > > > > > > > Robert I managed to replicate the issue in a local,
> > > > > > > > > > non-
> > > > > > > > > > containerized
> > > > > > > > > > environment (!!!).
> > > > > > > > > >
> > > > > > > > > > The problem seems to be when the database is kept but
> > > > > > > > > > the
> > > > > > > > > > 'sling'
> > > > > > > > > > directory
> > > > > > > > > > is cleared out across restarts (as it is for us when
> > > > > > > > > > the
> > > > > > > > > > container
> > > > > > > > > > goes
> > > > > > > > > > away). As I said before this doesn't seem to be a
> > > > > > > > > > problem
> > > > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > Sling 11
> > > > > > > > > > bundles.
> > > > > > > > > >
> > > > > > > > > > The first basic solution will be to persist the
> > > > > > > > > > 'sling'
> > > > > > > > > > directory
> > > > > > > > > > across
> > > > > > > > > > restarts, and I was wondering if this is a bug, or as
> > > > > > > > > > designed.
> > > > > > > > >
> > > > > > > > > I think this should work.
> > > > > > > > >
> > > > > > > > > > I also wonder if once persisted, multiple containers
> > > > > > > > > > could
> > > > > > > > > > share this
> > > > > > > > > > directory.
> > > > > > > > >
> > > > > > > > > This directory can't be shared, as it holds runtime
> > > > > > > > > data
> > > > > > > > > related
> > > > > > > > > to
> > > > > > > > > Sling. For instance, a bundle that is started in
> > > > > > > > > instance A
> > > > > > > > > could
> > > > > > > > > be
> > > > > > > > > starting on instance B.
> > > > > > > > >
> > > > > > > > > There is at least one file ( sling.id ) that holds data
> > > > > > > > > that
> > > > > > > > > must
> > > > > > > > > not
> > > > > > > > > be the same between instances.
> > > > > > > > >
> > > > > > > > > So I would advise as marking the directory as
> > > > > > > > > container-
> > > > > > > > > private
> > > > > > > > > as a
> > > > > > > > > first step.
> > > > > > > > >
> > > > > > > > > Robert
> > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Carlos
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks Robert (and once again I can't stress enough
> > > > > > > > > > > how
> > > > > > > > > > > grateful I
> > > > > > > > > > > am for
> > > > > > > > > > > all your help).
> > > > > > > > > > >
> > > > > > > > > > > Right now we deploy our container with the
> > > > > > > > > > > expectation
> > > > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > mongo db
> > > > > > > > > > > is the only necessary state we need to keep;
> > > > > > > > > > > everything
> > > > > > > > > > > else
> > > > > > > > > > > is
> > > > > > > > > > > throwaway.
> > > > > > > > > > > This means that a totally new container connected
> > > > > > > > > > > to the
> > > > > > > > > > > mongodb
> > > > > > > > > > > should
> > > > > > > > > > > pick up the state and run the same as the first
> > > > > > > > > > > time it
> > > > > > > > > > > was
> > > > > > > > > > > fired
> > > > > > > > > > > up. Do
> > > > > > > > > > > you think this is an incorrect assumption? If so,
> > > > > > > > > > > what
> > > > > > > > > > > are
> > > > > > > > > > > other
> > > > > > > > > > > pieces of
> > > > > > > > > > > state we should be keeping for subsequent restarts?
> > > > > > > > > > >
> > > > > > > > > > > This assumption has worked well for us with the
> > > > > > > > > > > current
> > > > > > > > > > > sling
> > > > > > > > > > > 11
> > > > > > > > > > > release,
> > > > > > > > > > > but it seems to break with the more up-to-date
> > > > > > > > > > > bundles.
> > > > > > > > > > > Perhaps
> > > > > > > > > > > running
> > > > > > > > > > > Sling in a container is just not meant to be.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > Carlos
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > > > > > > > > rombert@apache.org
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Carlos,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Thanks Bertrand. How can I run Sling with
> > > > > > > > > > > > > DEBUG-level
> > > > > > > > > > > > > logs for
> > > > > > > > > > > > > every
> > > > > > > > > > > > > bundle? I tried passing a few configuration
> > > > > > > > > > > > > arguments
> > > > > > > > > > > > > from the
> > > > > > > > > > > > > command line
> > > > > > > > > > > > > but nothing seemed to work.
> > > > > > > > > > > >
> > > > > > > > > > > > Try configuring the LogManager to debug at
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Robert
> > > > > > > > > > > >
> > > > > > > > > > > > > Carlos
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand
> > > > > > > > > > > > > Delacretaz <
> > > > > > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz
> > > > > > > > > > > > > > <
> > > > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > ...Is there a reason why the Jcr repository
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > restarting?
> > > > > > > > > > > > > > > And what
> > > > > > > > > > > > > > > class could we start looking into to debug
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > this is
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > case?...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It's not uncommon to see extra restarts of
> > > > > > > > > > > > > > OSGi
> > > > > > > > > > > > > > components at
> > > > > > > > > > > > > > startup,
> > > > > > > > > > > > > > for various reasons.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The simplest way to detect and log multiple
> > > > > > > > > > > > > > repository
> > > > > > > > > > > > > > startups
> > > > > > > > > > > > > > might
> > > > > > > > > > > > > > be to implement a SlingRepositoryInitializer
> > > > > > > > > > > > > > service
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > that's
> > > > > > > > > > > > > > called
> > > > > > > > > > > > > > at every startup, or use the logs of an
> > > > > > > > > > > > > > existing
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > like the
> > > > > > > > > > > > > > JCR
> > > > > > > > > > > > > > RepositoryInitializer [2] if that has
> > > > > > > > > > > > > > anything to
> > > > > > > > > > > > > > process in
> > > > > > > > > > > > > > your
> > > > > > > > > > > > > > system.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -Bertrand
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > >
> > > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > > >
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
Hi Carlos,

Apologies for the delay ...

What I was thinking of doing myself, but did not have the time is the
following

1. Find a version of Sling for which the scenario in SLING-9118 works.
Perhaps Sling Starter 11 is a good start.
2. Run a `git bisect` check between sling starter 11 and the current
master branch

Assuming my guess is correct, git would say

Bisecting: 36 revisions left to test after this (roughly 5 steps)
[c1aedf7b292f7835ceb4e2f56fedcb3294c60756] Update to Tika 1.21

So not that many steps to test.

If you would manage to isolate the change to the starter that broke
this, it would make it much easier to understand where the problem is
coming from.

Thanks!
Robert

On Mon, 2020-03-16 at 16:27 -0400, Carlos Munoz wrote:
> Hi Robert,
> 
> Just a friendly ping about this issue :)
> 
> We could try to submit a fix with some potential guidance from you.
> For
> example, which of the many Sling bundles should we start looking at?
> 
> Regards,
> 
> Carlos
> 
> 
> On Wed, Feb 26, 2020 at 7:24 AM Carlos Munoz <ca...@redhat.com>
> wrote:
> 
> > Thanks Robert. As always your help is appreciated.
> > 
> > On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <rombert@apache.org
> > >
> > wrote:
> > 
> > > Thanks, Ben,
> > > 
> > > I added a bit more detail, based on our mailing list
> > > conversations.
> > > I'll have limited access in the next two weeks, but if no one
> > > picks it
> > > up I'll look into it when I get back.
> > > 
> > > Thanks,
> > > Robert
> > > 
> > > On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> > > > I went ahead and created
> > > > https://issues.apache.org/jira/browse/SLING-9118
> > > > for this. Although the ultimate goal here is containerization,
> > > > I
> > > > neglected
> > > > to include any details to that effect in the ticket, since the
> > > > behavior is
> > > > reproducible without that being a complicating factor.
> > > > 
> > > > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <
> > > > rombert@apache.org>
> > > > wrote:
> > > > 
> > > > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > > > > I am following along conceptually - I want to make sure I
> > > > > > understand
> > > > > > what's
> > > > > > being described.
> > > > > > 
> > > > > > Let's say Sling Instance A starts successfully the first
> > > > > > time. If
> > > > > > we
> > > > > > restart Sling Instance A, we expect subsequent restarts to
> > > > > > also
> > > > > > succeed,
> > > > > > without removing the sling directory.
> > > > > > Now let's say Sling Instance B does NOT start successfully
> > > > > > the
> > > > > > first
> > > > > > time.
> > > > > > Despite that, we expect subsequent restarts to succeed
> > > > > > without
> > > > > > removing the
> > > > > > sling directory.
> > > > > > 
> > > > > > Correct so far?
> > > > > 
> > > > > Yes, correct.
> > > > > 
> > > > > > Assuming yes... what if this is running in k8s, and k8s
> > > > > > sees that
> > > > > > Sling
> > > > > > Instance B did not start successfully, and kills the pod
> > > > > > (removing
> > > > > > all pod
> > > > > > resources, including that pod's sling directory) in
> > > > > > response?
> > > > > > Presumably,
> > > > > > k8s would then start Sling Instance C, which is a fresh
> > > > > > instance
> > > > > > with
> > > > > > no
> > > > > > sling directory. Are we saying we expect C to have a 50/50
> > > > > > chance
> > > > > > of
> > > > > > starting successfully? Or have we observed different
> > > > > > behavior?
> > > > > 
> > > > > I think that only the first instance starts successfully.
> > > > > Additional
> > > > > instances will not start unless they have a Sling directory
> > > > > set up.
> > > > > 
> > > > > I've tested with a third instance, once two instances are up,
> > > > > and
> > > > > it
> > > > > has the exact same behaviour.
> > > > > 
> > > > > One workaround that I can suggest for a containerized
> > > > > environment
> > > > > is to
> > > > > use a supervisor script that detects the abnormal startup
> > > > > problem
> > > > > and
> > > > > restarts Sling, so that it starts up successfully.
> > > > > 
> > > > > Another would be to persist the 'sling' directory as a per-
> > > > > container
> > > > > volume. Not sure how easy that is with k8s, but maybe you can
> > > > > use a
> > > > > single ReadWriteMany volume at /sling, and each pod gets
> > > > > their own
> > > > > ${sling.home} at /sling/${containerId} ( assuming that is
> > > > > exposed
> > > > > through the downward API).
> > > > > 
> > > > > As these are workardounds, I would still very much like to
> > > > > see this
> > > > > fixed properly, so please file a bug to track this.
> > > > > 
> > > > > Thanks,
> > > > > Robert
> > > > > 
> > > > > > Thanks,
> > > > > > Ben
> > > > > > 
> > > > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <
> > > > > > camunoz@redhat.com
> > > > > > wrote:
> > > > > > 
> > > > > > > Thanks for the information Robert.
> > > > > > > 
> > > > > > > To replicate the issue all I needed was a mongodb (I used
> > > > > > > a
> > > > > > > full
> > > > > > > replica
> > > > > > > set, see my instructions in a previous email about how to
> > > > > > > get
> > > > > > > one
> > > > > > > going
> > > > > > > using podman) and a single process running sling.
> > > > > > > 
> > > > > > > The problem does happen when I do the following:
> > > > > > > 
> > > > > > > 2. Start Sling instance A, wait for it to start
> > > > > > > 3. Stop Sling instance A, wait for it to stop
> > > > > > > 4. Start Sling instance B - Error
> > > > > > > 
> > > > > > > but let me add more
> > > > > > > 
> > > > > > > 5. Start Sling Instance A again - Success (note I didn't
> > > > > > > remove
> > > > > > > the
> > > > > > > sling
> > > > > > > dir)
> > > > > > > 6. Start Sling instance B again - Success (note I didn't
> > > > > > > remove
> > > > > > > the
> > > > > > > sling
> > > > > > > dir)
> > > > > > > 
> > > > > > > this means that even if Sling recreates the sling
> > > > > > > directory and
> > > > > > > fails the
> > > > > > > startup, next time it will succeed. Unfortunately we
> > > > > > > don't have
> > > > > > > that luxury
> > > > > > > in containers because the sling directory is not
> > > > > > > persisted.
> > > > > > > 
> > > > > > > I think this is a bug, but I'll keep playing with it a
> > > > > > > bit to
> > > > > > > see
> > > > > > > if I can
> > > > > > > find out more.
> > > > > > > 
> > > > > > > Carlos
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > > > > rombert@apache.org
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > > > > > > > Robert I managed to replicate the issue in a local,
> > > > > > > > > non-
> > > > > > > > > containerized
> > > > > > > > > environment (!!!).
> > > > > > > > > 
> > > > > > > > > The problem seems to be when the database is kept but
> > > > > > > > > the
> > > > > > > > > 'sling'
> > > > > > > > > directory
> > > > > > > > > is cleared out across restarts (as it is for us when
> > > > > > > > > the
> > > > > > > > > container
> > > > > > > > > goes
> > > > > > > > > away). As I said before this doesn't seem to be a
> > > > > > > > > problem
> > > > > > > > > with
> > > > > > > > > the
> > > > > > > > > Sling 11
> > > > > > > > > bundles.
> > > > > > > > > 
> > > > > > > > > The first basic solution will be to persist the
> > > > > > > > > 'sling'
> > > > > > > > > directory
> > > > > > > > > across
> > > > > > > > > restarts, and I was wondering if this is a bug, or as
> > > > > > > > > designed.
> > > > > > > > 
> > > > > > > > I think this should work.
> > > > > > > > 
> > > > > > > > > I also wonder if once persisted, multiple containers
> > > > > > > > > could
> > > > > > > > > share this
> > > > > > > > > directory.
> > > > > > > > 
> > > > > > > > This directory can't be shared, as it holds runtime
> > > > > > > > data
> > > > > > > > related
> > > > > > > > to
> > > > > > > > Sling. For instance, a bundle that is started in
> > > > > > > > instance A
> > > > > > > > could
> > > > > > > > be
> > > > > > > > starting on instance B.
> > > > > > > > 
> > > > > > > > There is at least one file ( sling.id ) that holds data
> > > > > > > > that
> > > > > > > > must
> > > > > > > > not
> > > > > > > > be the same between instances.
> > > > > > > > 
> > > > > > > > So I would advise as marking the directory as
> > > > > > > > container-
> > > > > > > > private
> > > > > > > > as a
> > > > > > > > first step.
> > > > > > > > 
> > > > > > > > Robert
> > > > > > > > 
> > > > > > > > > Regards,
> > > > > > > > > 
> > > > > > > > > Carlos
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > > > > > camunoz@redhat.com>
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Thanks Robert (and once again I can't stress enough
> > > > > > > > > > how
> > > > > > > > > > grateful I
> > > > > > > > > > am for
> > > > > > > > > > all your help).
> > > > > > > > > > 
> > > > > > > > > > Right now we deploy our container with the
> > > > > > > > > > expectation
> > > > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > mongo db
> > > > > > > > > > is the only necessary state we need to keep;
> > > > > > > > > > everything
> > > > > > > > > > else
> > > > > > > > > > is
> > > > > > > > > > throwaway.
> > > > > > > > > > This means that a totally new container connected
> > > > > > > > > > to the
> > > > > > > > > > mongodb
> > > > > > > > > > should
> > > > > > > > > > pick up the state and run the same as the first
> > > > > > > > > > time it
> > > > > > > > > > was
> > > > > > > > > > fired
> > > > > > > > > > up. Do
> > > > > > > > > > you think this is an incorrect assumption? If so,
> > > > > > > > > > what
> > > > > > > > > > are
> > > > > > > > > > other
> > > > > > > > > > pieces of
> > > > > > > > > > state we should be keeping for subsequent restarts?
> > > > > > > > > > 
> > > > > > > > > > This assumption has worked well for us with the
> > > > > > > > > > current
> > > > > > > > > > sling
> > > > > > > > > > 11
> > > > > > > > > > release,
> > > > > > > > > > but it seems to break with the more up-to-date
> > > > > > > > > > bundles.
> > > > > > > > > > Perhaps
> > > > > > > > > > running
> > > > > > > > > > Sling in a container is just not meant to be.
> > > > > > > > > > 
> > > > > > > > > > Regards,
> > > > > > > > > > 
> > > > > > > > > > Carlos
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > > > > > > > rombert@apache.org
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > Hi Carlos,
> > > > > > > > > > > 
> > > > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Thanks Bertrand. How can I run Sling with
> > > > > > > > > > > > DEBUG-level
> > > > > > > > > > > > logs for
> > > > > > > > > > > > every
> > > > > > > > > > > > bundle? I tried passing a few configuration
> > > > > > > > > > > > arguments
> > > > > > > > > > > > from the
> > > > > > > > > > > > command line
> > > > > > > > > > > > but nothing seemed to work.
> > > > > > > > > > > 
> > > > > > > > > > > Try configuring the LogManager to debug at
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Robert
> > > > > > > > > > > 
> > > > > > > > > > > > Carlos
> > > > > > > > > > > > 
> > > > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand
> > > > > > > > > > > > Delacretaz <
> > > > > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz
> > > > > > > > > > > > > <
> > > > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > ...Is there a reason why the Jcr repository
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > restarting?
> > > > > > > > > > > > > > And what
> > > > > > > > > > > > > > class could we start looking into to debug
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > this is
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > case?...
> > > > > > > > > > > > > 
> > > > > > > > > > > > > It's not uncommon to see extra restarts of
> > > > > > > > > > > > > OSGi
> > > > > > > > > > > > > components at
> > > > > > > > > > > > > startup,
> > > > > > > > > > > > > for various reasons.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The simplest way to detect and log multiple
> > > > > > > > > > > > > repository
> > > > > > > > > > > > > startups
> > > > > > > > > > > > > might
> > > > > > > > > > > > > be to implement a SlingRepositoryInitializer
> > > > > > > > > > > > > service
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > > that's
> > > > > > > > > > > > > called
> > > > > > > > > > > > > at every startup, or use the logs of an
> > > > > > > > > > > > > existing
> > > > > > > > > > > > > one
> > > > > > > > > > > > > like the
> > > > > > > > > > > > > JCR
> > > > > > > > > > > > > RepositoryInitializer [2] if that has
> > > > > > > > > > > > > anything to
> > > > > > > > > > > > > process in
> > > > > > > > > > > > > your
> > > > > > > > > > > > > system.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > -Bertrand
> > > > > > > > > > > > > 
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > > 
> > > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > > > > [2]
> > > > > > > > > > > > > 
> > > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Hi Robert,

Just a friendly ping about this issue :)

We could try to submit a fix with some potential guidance from you. For
example, which of the many Sling bundles should we start looking at?

Regards,

Carlos


On Wed, Feb 26, 2020 at 7:24 AM Carlos Munoz <ca...@redhat.com> wrote:

> Thanks Robert. As always your help is appreciated.
>
> On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <ro...@apache.org>
> wrote:
>
>> Thanks, Ben,
>>
>> I added a bit more detail, based on our mailing list conversations.
>> I'll have limited access in the next two weeks, but if no one picks it
>> up I'll look into it when I get back.
>>
>> Thanks,
>> Robert
>>
>> On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
>> > I went ahead and created
>> > https://issues.apache.org/jira/browse/SLING-9118
>> > for this. Although the ultimate goal here is containerization, I
>> > neglected
>> > to include any details to that effect in the ticket, since the
>> > behavior is
>> > reproducible without that being a complicating factor.
>> >
>> > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <ro...@apache.org>
>> > wrote:
>> >
>> > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
>> > > > I am following along conceptually - I want to make sure I
>> > > > understand
>> > > > what's
>> > > > being described.
>> > > >
>> > > > Let's say Sling Instance A starts successfully the first time. If
>> > > > we
>> > > > restart Sling Instance A, we expect subsequent restarts to also
>> > > > succeed,
>> > > > without removing the sling directory.
>> > > > Now let's say Sling Instance B does NOT start successfully the
>> > > > first
>> > > > time.
>> > > > Despite that, we expect subsequent restarts to succeed without
>> > > > removing the
>> > > > sling directory.
>> > > >
>> > > > Correct so far?
>> > >
>> > > Yes, correct.
>> > >
>> > > > Assuming yes... what if this is running in k8s, and k8s sees that
>> > > > Sling
>> > > > Instance B did not start successfully, and kills the pod
>> > > > (removing
>> > > > all pod
>> > > > resources, including that pod's sling directory) in response?
>> > > > Presumably,
>> > > > k8s would then start Sling Instance C, which is a fresh instance
>> > > > with
>> > > > no
>> > > > sling directory. Are we saying we expect C to have a 50/50 chance
>> > > > of
>> > > > starting successfully? Or have we observed different behavior?
>> > >
>> > > I think that only the first instance starts successfully.
>> > > Additional
>> > > instances will not start unless they have a Sling directory set up.
>> > >
>> > > I've tested with a third instance, once two instances are up, and
>> > > it
>> > > has the exact same behaviour.
>> > >
>> > > One workaround that I can suggest for a containerized environment
>> > > is to
>> > > use a supervisor script that detects the abnormal startup problem
>> > > and
>> > > restarts Sling, so that it starts up successfully.
>> > >
>> > > Another would be to persist the 'sling' directory as a per-
>> > > container
>> > > volume. Not sure how easy that is with k8s, but maybe you can use a
>> > > single ReadWriteMany volume at /sling, and each pod gets their own
>> > > ${sling.home} at /sling/${containerId} ( assuming that is exposed
>> > > through the downward API).
>> > >
>> > > As these are workardounds, I would still very much like to see this
>> > > fixed properly, so please file a bug to track this.
>> > >
>> > > Thanks,
>> > > Robert
>> > >
>> > > > Thanks,
>> > > > Ben
>> > > >
>> > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <camunoz@redhat.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Thanks for the information Robert.
>> > > > >
>> > > > > To replicate the issue all I needed was a mongodb (I used a
>> > > > > full
>> > > > > replica
>> > > > > set, see my instructions in a previous email about how to get
>> > > > > one
>> > > > > going
>> > > > > using podman) and a single process running sling.
>> > > > >
>> > > > > The problem does happen when I do the following:
>> > > > >
>> > > > > 2. Start Sling instance A, wait for it to start
>> > > > > 3. Stop Sling instance A, wait for it to stop
>> > > > > 4. Start Sling instance B - Error
>> > > > >
>> > > > > but let me add more
>> > > > >
>> > > > > 5. Start Sling Instance A again - Success (note I didn't remove
>> > > > > the
>> > > > > sling
>> > > > > dir)
>> > > > > 6. Start Sling instance B again - Success (note I didn't remove
>> > > > > the
>> > > > > sling
>> > > > > dir)
>> > > > >
>> > > > > this means that even if Sling recreates the sling directory and
>> > > > > fails the
>> > > > > startup, next time it will succeed. Unfortunately we don't have
>> > > > > that luxury
>> > > > > in containers because the sling directory is not persisted.
>> > > > >
>> > > > > I think this is a bug, but I'll keep playing with it a bit to
>> > > > > see
>> > > > > if I can
>> > > > > find out more.
>> > > > >
>> > > > > Carlos
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
>> > > > > rombert@apache.org
>> > > > > wrote:
>> > > > >
>> > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
>> > > > > > > Robert I managed to replicate the issue in a local, non-
>> > > > > > > containerized
>> > > > > > > environment (!!!).
>> > > > > > >
>> > > > > > > The problem seems to be when the database is kept but the
>> > > > > > > 'sling'
>> > > > > > > directory
>> > > > > > > is cleared out across restarts (as it is for us when the
>> > > > > > > container
>> > > > > > > goes
>> > > > > > > away). As I said before this doesn't seem to be a problem
>> > > > > > > with
>> > > > > > > the
>> > > > > > > Sling 11
>> > > > > > > bundles.
>> > > > > > >
>> > > > > > > The first basic solution will be to persist the 'sling'
>> > > > > > > directory
>> > > > > > > across
>> > > > > > > restarts, and I was wondering if this is a bug, or as
>> > > > > > > designed.
>> > > > > >
>> > > > > > I think this should work.
>> > > > > >
>> > > > > > > I also wonder if once persisted, multiple containers could
>> > > > > > > share this
>> > > > > > > directory.
>> > > > > >
>> > > > > > This directory can't be shared, as it holds runtime data
>> > > > > > related
>> > > > > > to
>> > > > > > Sling. For instance, a bundle that is started in instance A
>> > > > > > could
>> > > > > > be
>> > > > > > starting on instance B.
>> > > > > >
>> > > > > > There is at least one file ( sling.id ) that holds data that
>> > > > > > must
>> > > > > > not
>> > > > > > be the same between instances.
>> > > > > >
>> > > > > > So I would advise as marking the directory as container-
>> > > > > > private
>> > > > > > as a
>> > > > > > first step.
>> > > > > >
>> > > > > > Robert
>> > > > > >
>> > > > > > > Regards,
>> > > > > > >
>> > > > > > > Carlos
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
>> > > > > > > camunoz@redhat.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Thanks Robert (and once again I can't stress enough how
>> > > > > > > > grateful I
>> > > > > > > > am for
>> > > > > > > > all your help).
>> > > > > > > >
>> > > > > > > > Right now we deploy our container with the expectation
>> > > > > > > > that
>> > > > > > > > the
>> > > > > > > > mongo db
>> > > > > > > > is the only necessary state we need to keep; everything
>> > > > > > > > else
>> > > > > > > > is
>> > > > > > > > throwaway.
>> > > > > > > > This means that a totally new container connected to the
>> > > > > > > > mongodb
>> > > > > > > > should
>> > > > > > > > pick up the state and run the same as the first time it
>> > > > > > > > was
>> > > > > > > > fired
>> > > > > > > > up. Do
>> > > > > > > > you think this is an incorrect assumption? If so, what
>> > > > > > > > are
>> > > > > > > > other
>> > > > > > > > pieces of
>> > > > > > > > state we should be keeping for subsequent restarts?
>> > > > > > > >
>> > > > > > > > This assumption has worked well for us with the current
>> > > > > > > > sling
>> > > > > > > > 11
>> > > > > > > > release,
>> > > > > > > > but it seems to break with the more up-to-date bundles.
>> > > > > > > > Perhaps
>> > > > > > > > running
>> > > > > > > > Sling in a container is just not meant to be.
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > >
>> > > > > > > > Carlos
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
>> > > > > > > > rombert@apache.org
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Carlos,
>> > > > > > > > >
>> > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
>> > > > > > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level
>> > > > > > > > > > logs for
>> > > > > > > > > > every
>> > > > > > > > > > bundle? I tried passing a few configuration arguments
>> > > > > > > > > > from the
>> > > > > > > > > > command line
>> > > > > > > > > > but nothing seemed to work.
>> > > > > > > > >
>> > > > > > > > > Try configuring the LogManager to debug at
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > >
>> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
>> > > > > > > > > Thanks,
>> > > > > > > > > Robert
>> > > > > > > > >
>> > > > > > > > > > Carlos
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
>> > > > > > > > > > bdelacretaz@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi,
>> > > > > > > > > > >
>> > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
>> > > > > > > > > > > camunoz@redhat.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > ...Is there a reason why the Jcr repository could
>> > > > > > > > > > > > be
>> > > > > > > > > > > > restarting?
>> > > > > > > > > > > > And what
>> > > > > > > > > > > > class could we start looking into to debug if
>> > > > > > > > > > > > this is
>> > > > > > > > > > > > the
>> > > > > > > > > > > > case?...
>> > > > > > > > > > >
>> > > > > > > > > > > It's not uncommon to see extra restarts of OSGi
>> > > > > > > > > > > components at
>> > > > > > > > > > > startup,
>> > > > > > > > > > > for various reasons.
>> > > > > > > > > > >
>> > > > > > > > > > > The simplest way to detect and log multiple
>> > > > > > > > > > > repository
>> > > > > > > > > > > startups
>> > > > > > > > > > > might
>> > > > > > > > > > > be to implement a SlingRepositoryInitializer
>> > > > > > > > > > > service
>> > > > > > > > > > > [1]
>> > > > > > > > > > > that's
>> > > > > > > > > > > called
>> > > > > > > > > > > at every startup, or use the logs of an existing
>> > > > > > > > > > > one
>> > > > > > > > > > > like the
>> > > > > > > > > > > JCR
>> > > > > > > > > > > RepositoryInitializer [2] if that has anything to
>> > > > > > > > > > > process in
>> > > > > > > > > > > your
>> > > > > > > > > > > system.
>> > > > > > > > > > >
>> > > > > > > > > > > -Bertrand
>> > > > > > > > > > >
>> > > > > > > > > > > [1]
>> > > > > > > > > > >
>> > >
>> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
>> > > > > > > > > > > [2]
>> > > > > > > > > > >
>> > >
>> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
>> > >
>> > >
>>
>>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks Robert. As always your help is appreciated.

On Fri, Feb 21, 2020 at 6:28 PM Robert Munteanu <ro...@apache.org> wrote:

> Thanks, Ben,
>
> I added a bit more detail, based on our mailing list conversations.
> I'll have limited access in the next two weeks, but if no one picks it
> up I'll look into it when I get back.
>
> Thanks,
> Robert
>
> On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> > I went ahead and created
> > https://issues.apache.org/jira/browse/SLING-9118
> > for this. Although the ultimate goal here is containerization, I
> > neglected
> > to include any details to that effect in the ticket, since the
> > behavior is
> > reproducible without that being a complicating factor.
> >
> > On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <ro...@apache.org>
> > wrote:
> >
> > > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > > I am following along conceptually - I want to make sure I
> > > > understand
> > > > what's
> > > > being described.
> > > >
> > > > Let's say Sling Instance A starts successfully the first time. If
> > > > we
> > > > restart Sling Instance A, we expect subsequent restarts to also
> > > > succeed,
> > > > without removing the sling directory.
> > > > Now let's say Sling Instance B does NOT start successfully the
> > > > first
> > > > time.
> > > > Despite that, we expect subsequent restarts to succeed without
> > > > removing the
> > > > sling directory.
> > > >
> > > > Correct so far?
> > >
> > > Yes, correct.
> > >
> > > > Assuming yes... what if this is running in k8s, and k8s sees that
> > > > Sling
> > > > Instance B did not start successfully, and kills the pod
> > > > (removing
> > > > all pod
> > > > resources, including that pod's sling directory) in response?
> > > > Presumably,
> > > > k8s would then start Sling Instance C, which is a fresh instance
> > > > with
> > > > no
> > > > sling directory. Are we saying we expect C to have a 50/50 chance
> > > > of
> > > > starting successfully? Or have we observed different behavior?
> > >
> > > I think that only the first instance starts successfully.
> > > Additional
> > > instances will not start unless they have a Sling directory set up.
> > >
> > > I've tested with a third instance, once two instances are up, and
> > > it
> > > has the exact same behaviour.
> > >
> > > One workaround that I can suggest for a containerized environment
> > > is to
> > > use a supervisor script that detects the abnormal startup problem
> > > and
> > > restarts Sling, so that it starts up successfully.
> > >
> > > Another would be to persist the 'sling' directory as a per-
> > > container
> > > volume. Not sure how easy that is with k8s, but maybe you can use a
> > > single ReadWriteMany volume at /sling, and each pod gets their own
> > > ${sling.home} at /sling/${containerId} ( assuming that is exposed
> > > through the downward API).
> > >
> > > As these are workardounds, I would still very much like to see this
> > > fixed properly, so please file a bug to track this.
> > >
> > > Thanks,
> > > Robert
> > >
> > > > Thanks,
> > > > Ben
> > > >
> > > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <camunoz@redhat.com
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks for the information Robert.
> > > > >
> > > > > To replicate the issue all I needed was a mongodb (I used a
> > > > > full
> > > > > replica
> > > > > set, see my instructions in a previous email about how to get
> > > > > one
> > > > > going
> > > > > using podman) and a single process running sling.
> > > > >
> > > > > The problem does happen when I do the following:
> > > > >
> > > > > 2. Start Sling instance A, wait for it to start
> > > > > 3. Stop Sling instance A, wait for it to stop
> > > > > 4. Start Sling instance B - Error
> > > > >
> > > > > but let me add more
> > > > >
> > > > > 5. Start Sling Instance A again - Success (note I didn't remove
> > > > > the
> > > > > sling
> > > > > dir)
> > > > > 6. Start Sling instance B again - Success (note I didn't remove
> > > > > the
> > > > > sling
> > > > > dir)
> > > > >
> > > > > this means that even if Sling recreates the sling directory and
> > > > > fails the
> > > > > startup, next time it will succeed. Unfortunately we don't have
> > > > > that luxury
> > > > > in containers because the sling directory is not persisted.
> > > > >
> > > > > I think this is a bug, but I'll keep playing with it a bit to
> > > > > see
> > > > > if I can
> > > > > find out more.
> > > > >
> > > > > Carlos
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > > rombert@apache.org
> > > > > wrote:
> > > > >
> > > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > > > > > Robert I managed to replicate the issue in a local, non-
> > > > > > > containerized
> > > > > > > environment (!!!).
> > > > > > >
> > > > > > > The problem seems to be when the database is kept but the
> > > > > > > 'sling'
> > > > > > > directory
> > > > > > > is cleared out across restarts (as it is for us when the
> > > > > > > container
> > > > > > > goes
> > > > > > > away). As I said before this doesn't seem to be a problem
> > > > > > > with
> > > > > > > the
> > > > > > > Sling 11
> > > > > > > bundles.
> > > > > > >
> > > > > > > The first basic solution will be to persist the 'sling'
> > > > > > > directory
> > > > > > > across
> > > > > > > restarts, and I was wondering if this is a bug, or as
> > > > > > > designed.
> > > > > >
> > > > > > I think this should work.
> > > > > >
> > > > > > > I also wonder if once persisted, multiple containers could
> > > > > > > share this
> > > > > > > directory.
> > > > > >
> > > > > > This directory can't be shared, as it holds runtime data
> > > > > > related
> > > > > > to
> > > > > > Sling. For instance, a bundle that is started in instance A
> > > > > > could
> > > > > > be
> > > > > > starting on instance B.
> > > > > >
> > > > > > There is at least one file ( sling.id ) that holds data that
> > > > > > must
> > > > > > not
> > > > > > be the same between instances.
> > > > > >
> > > > > > So I would advise as marking the directory as container-
> > > > > > private
> > > > > > as a
> > > > > > first step.
> > > > > >
> > > > > > Robert
> > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Carlos
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > > > camunoz@redhat.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Robert (and once again I can't stress enough how
> > > > > > > > grateful I
> > > > > > > > am for
> > > > > > > > all your help).
> > > > > > > >
> > > > > > > > Right now we deploy our container with the expectation
> > > > > > > > that
> > > > > > > > the
> > > > > > > > mongo db
> > > > > > > > is the only necessary state we need to keep; everything
> > > > > > > > else
> > > > > > > > is
> > > > > > > > throwaway.
> > > > > > > > This means that a totally new container connected to the
> > > > > > > > mongodb
> > > > > > > > should
> > > > > > > > pick up the state and run the same as the first time it
> > > > > > > > was
> > > > > > > > fired
> > > > > > > > up. Do
> > > > > > > > you think this is an incorrect assumption? If so, what
> > > > > > > > are
> > > > > > > > other
> > > > > > > > pieces of
> > > > > > > > state we should be keeping for subsequent restarts?
> > > > > > > >
> > > > > > > > This assumption has worked well for us with the current
> > > > > > > > sling
> > > > > > > > 11
> > > > > > > > release,
> > > > > > > > but it seems to break with the more up-to-date bundles.
> > > > > > > > Perhaps
> > > > > > > > running
> > > > > > > > Sling in a container is just not meant to be.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Carlos
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > > > > > rombert@apache.org
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Carlos,
> > > > > > > > >
> > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level
> > > > > > > > > > logs for
> > > > > > > > > > every
> > > > > > > > > > bundle? I tried passing a few configuration arguments
> > > > > > > > > > from the
> > > > > > > > > > command line
> > > > > > > > > > but nothing seemed to work.
> > > > > > > > >
> > > > > > > > > Try configuring the LogManager to debug at
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > > Thanks,
> > > > > > > > > Robert
> > > > > > > > >
> > > > > > > > > > Carlos
> > > > > > > > > >
> > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > ...Is there a reason why the Jcr repository could
> > > > > > > > > > > > be
> > > > > > > > > > > > restarting?
> > > > > > > > > > > > And what
> > > > > > > > > > > > class could we start looking into to debug if
> > > > > > > > > > > > this is
> > > > > > > > > > > > the
> > > > > > > > > > > > case?...
> > > > > > > > > > >
> > > > > > > > > > > It's not uncommon to see extra restarts of OSGi
> > > > > > > > > > > components at
> > > > > > > > > > > startup,
> > > > > > > > > > > for various reasons.
> > > > > > > > > > >
> > > > > > > > > > > The simplest way to detect and log multiple
> > > > > > > > > > > repository
> > > > > > > > > > > startups
> > > > > > > > > > > might
> > > > > > > > > > > be to implement a SlingRepositoryInitializer
> > > > > > > > > > > service
> > > > > > > > > > > [1]
> > > > > > > > > > > that's
> > > > > > > > > > > called
> > > > > > > > > > > at every startup, or use the logs of an existing
> > > > > > > > > > > one
> > > > > > > > > > > like the
> > > > > > > > > > > JCR
> > > > > > > > > > > RepositoryInitializer [2] if that has anything to
> > > > > > > > > > > process in
> > > > > > > > > > > your
> > > > > > > > > > > system.
> > > > > > > > > > >
> > > > > > > > > > > -Bertrand
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > > [2]
> > > > > > > > > > >
> > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > >
> > >
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
Thanks, Ben,

I added a bit more detail, based on our mailing list conversations.
I'll have limited access in the next two weeks, but if no one picks it
up I'll look into it when I get back.

Thanks,
Robert

On Fri, 2020-02-21 at 11:01 -0500, Ben Radey wrote:
> I went ahead and created 
> https://issues.apache.org/jira/browse/SLING-9118
> for this. Although the ultimate goal here is containerization, I
> neglected
> to include any details to that effect in the ticket, since the
> behavior is
> reproducible without that being a complicating factor.
> 
> On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > > I am following along conceptually - I want to make sure I
> > > understand
> > > what's
> > > being described.
> > > 
> > > Let's say Sling Instance A starts successfully the first time. If
> > > we
> > > restart Sling Instance A, we expect subsequent restarts to also
> > > succeed,
> > > without removing the sling directory.
> > > Now let's say Sling Instance B does NOT start successfully the
> > > first
> > > time.
> > > Despite that, we expect subsequent restarts to succeed without
> > > removing the
> > > sling directory.
> > > 
> > > Correct so far?
> > 
> > Yes, correct.
> > 
> > > Assuming yes... what if this is running in k8s, and k8s sees that
> > > Sling
> > > Instance B did not start successfully, and kills the pod
> > > (removing
> > > all pod
> > > resources, including that pod's sling directory) in response?
> > > Presumably,
> > > k8s would then start Sling Instance C, which is a fresh instance
> > > with
> > > no
> > > sling directory. Are we saying we expect C to have a 50/50 chance
> > > of
> > > starting successfully? Or have we observed different behavior?
> > 
> > I think that only the first instance starts successfully.
> > Additional
> > instances will not start unless they have a Sling directory set up.
> > 
> > I've tested with a third instance, once two instances are up, and
> > it
> > has the exact same behaviour.
> > 
> > One workaround that I can suggest for a containerized environment
> > is to
> > use a supervisor script that detects the abnormal startup problem
> > and
> > restarts Sling, so that it starts up successfully.
> > 
> > Another would be to persist the 'sling' directory as a per-
> > container
> > volume. Not sure how easy that is with k8s, but maybe you can use a
> > single ReadWriteMany volume at /sling, and each pod gets their own
> > ${sling.home} at /sling/${containerId} ( assuming that is exposed
> > through the downward API).
> > 
> > As these are workardounds, I would still very much like to see this
> > fixed properly, so please file a bug to track this.
> > 
> > Thanks,
> > Robert
> > 
> > > Thanks,
> > > Ben
> > > 
> > > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <camunoz@redhat.com
> > > >
> > > wrote:
> > > 
> > > > Thanks for the information Robert.
> > > > 
> > > > To replicate the issue all I needed was a mongodb (I used a
> > > > full
> > > > replica
> > > > set, see my instructions in a previous email about how to get
> > > > one
> > > > going
> > > > using podman) and a single process running sling.
> > > > 
> > > > The problem does happen when I do the following:
> > > > 
> > > > 2. Start Sling instance A, wait for it to start
> > > > 3. Stop Sling instance A, wait for it to stop
> > > > 4. Start Sling instance B - Error
> > > > 
> > > > but let me add more
> > > > 
> > > > 5. Start Sling Instance A again - Success (note I didn't remove
> > > > the
> > > > sling
> > > > dir)
> > > > 6. Start Sling instance B again - Success (note I didn't remove
> > > > the
> > > > sling
> > > > dir)
> > > > 
> > > > this means that even if Sling recreates the sling directory and
> > > > fails the
> > > > startup, next time it will succeed. Unfortunately we don't have
> > > > that luxury
> > > > in containers because the sling directory is not persisted.
> > > > 
> > > > I think this is a bug, but I'll keep playing with it a bit to
> > > > see
> > > > if I can
> > > > find out more.
> > > > 
> > > > Carlos
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <
> > > > rombert@apache.org
> > > > wrote:
> > > > 
> > > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > > > > Robert I managed to replicate the issue in a local, non-
> > > > > > containerized
> > > > > > environment (!!!).
> > > > > > 
> > > > > > The problem seems to be when the database is kept but the
> > > > > > 'sling'
> > > > > > directory
> > > > > > is cleared out across restarts (as it is for us when the
> > > > > > container
> > > > > > goes
> > > > > > away). As I said before this doesn't seem to be a problem
> > > > > > with
> > > > > > the
> > > > > > Sling 11
> > > > > > bundles.
> > > > > > 
> > > > > > The first basic solution will be to persist the 'sling'
> > > > > > directory
> > > > > > across
> > > > > > restarts, and I was wondering if this is a bug, or as
> > > > > > designed.
> > > > > 
> > > > > I think this should work.
> > > > > 
> > > > > > I also wonder if once persisted, multiple containers could
> > > > > > share this
> > > > > > directory.
> > > > > 
> > > > > This directory can't be shared, as it holds runtime data
> > > > > related
> > > > > to
> > > > > Sling. For instance, a bundle that is started in instance A
> > > > > could
> > > > > be
> > > > > starting on instance B.
> > > > > 
> > > > > There is at least one file ( sling.id ) that holds data that
> > > > > must
> > > > > not
> > > > > be the same between instances.
> > > > > 
> > > > > So I would advise as marking the directory as container-
> > > > > private
> > > > > as a
> > > > > first step.
> > > > > 
> > > > > Robert
> > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Carlos
> > > > > > 
> > > > > > 
> > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > > camunoz@redhat.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > Thanks Robert (and once again I can't stress enough how
> > > > > > > grateful I
> > > > > > > am for
> > > > > > > all your help).
> > > > > > > 
> > > > > > > Right now we deploy our container with the expectation
> > > > > > > that
> > > > > > > the
> > > > > > > mongo db
> > > > > > > is the only necessary state we need to keep; everything
> > > > > > > else
> > > > > > > is
> > > > > > > throwaway.
> > > > > > > This means that a totally new container connected to the
> > > > > > > mongodb
> > > > > > > should
> > > > > > > pick up the state and run the same as the first time it
> > > > > > > was
> > > > > > > fired
> > > > > > > up. Do
> > > > > > > you think this is an incorrect assumption? If so, what
> > > > > > > are
> > > > > > > other
> > > > > > > pieces of
> > > > > > > state we should be keeping for subsequent restarts?
> > > > > > > 
> > > > > > > This assumption has worked well for us with the current
> > > > > > > sling
> > > > > > > 11
> > > > > > > release,
> > > > > > > but it seems to break with the more up-to-date bundles.
> > > > > > > Perhaps
> > > > > > > running
> > > > > > > Sling in a container is just not meant to be.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > 
> > > > > > > Carlos
> > > > > > > 
> > > > > > > 
> > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > > > > rombert@apache.org
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > Hi Carlos,
> > > > > > > > 
> > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level
> > > > > > > > > logs for
> > > > > > > > > every
> > > > > > > > > bundle? I tried passing a few configuration arguments
> > > > > > > > > from the
> > > > > > > > > command line
> > > > > > > > > but nothing seemed to work.
> > > > > > > > 
> > > > > > > > Try configuring the LogManager to debug at
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > > Thanks,
> > > > > > > > Robert
> > > > > > > > 
> > > > > > > > > Carlos
> > > > > > > > > 
> > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > > > > > bdelacretaz@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > > > > > camunoz@redhat.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > ...Is there a reason why the Jcr repository could
> > > > > > > > > > > be
> > > > > > > > > > > restarting?
> > > > > > > > > > > And what
> > > > > > > > > > > class could we start looking into to debug if
> > > > > > > > > > > this is
> > > > > > > > > > > the
> > > > > > > > > > > case?...
> > > > > > > > > > 
> > > > > > > > > > It's not uncommon to see extra restarts of OSGi
> > > > > > > > > > components at
> > > > > > > > > > startup,
> > > > > > > > > > for various reasons.
> > > > > > > > > > 
> > > > > > > > > > The simplest way to detect and log multiple
> > > > > > > > > > repository
> > > > > > > > > > startups
> > > > > > > > > > might
> > > > > > > > > > be to implement a SlingRepositoryInitializer
> > > > > > > > > > service
> > > > > > > > > > [1]
> > > > > > > > > > that's
> > > > > > > > > > called
> > > > > > > > > > at every startup, or use the logs of an existing
> > > > > > > > > > one
> > > > > > > > > > like the
> > > > > > > > > > JCR
> > > > > > > > > > RepositoryInitializer [2] if that has anything to
> > > > > > > > > > process in
> > > > > > > > > > your
> > > > > > > > > > system.
> > > > > > > > > > 
> > > > > > > > > > -Bertrand
> > > > > > > > > > 
> > > > > > > > > > [1]
> > > > > > > > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > > [2]
> > > > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > 
> > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Ben Radey <br...@redhat.com>.
I went ahead and created https://issues.apache.org/jira/browse/SLING-9118
for this. Although the ultimate goal here is containerization, I neglected
to include any details to that effect in the ticket, since the behavior is
reproducible without that being a complicating factor.

On Thu, Feb 20, 2020 at 7:25 AM Robert Munteanu <ro...@apache.org> wrote:

> On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> > I am following along conceptually - I want to make sure I understand
> > what's
> > being described.
> >
> > Let's say Sling Instance A starts successfully the first time. If we
> > restart Sling Instance A, we expect subsequent restarts to also
> > succeed,
> > without removing the sling directory.
> > Now let's say Sling Instance B does NOT start successfully the first
> > time.
> > Despite that, we expect subsequent restarts to succeed without
> > removing the
> > sling directory.
> >
> > Correct so far?
>
> Yes, correct.
>
> >
> > Assuming yes... what if this is running in k8s, and k8s sees that
> > Sling
> > Instance B did not start successfully, and kills the pod (removing
> > all pod
> > resources, including that pod's sling directory) in response?
> > Presumably,
> > k8s would then start Sling Instance C, which is a fresh instance with
> > no
> > sling directory. Are we saying we expect C to have a 50/50 chance of
> > starting successfully? Or have we observed different behavior?
>
> I think that only the first instance starts successfully. Additional
> instances will not start unless they have a Sling directory set up.
>
> I've tested with a third instance, once two instances are up, and it
> has the exact same behaviour.
>
> One workaround that I can suggest for a containerized environment is to
> use a supervisor script that detects the abnormal startup problem and
> restarts Sling, so that it starts up successfully.
>
> Another would be to persist the 'sling' directory as a per-container
> volume. Not sure how easy that is with k8s, but maybe you can use a
> single ReadWriteMany volume at /sling, and each pod gets their own
> ${sling.home} at /sling/${containerId} ( assuming that is exposed
> through the downward API).
>
> As these are workardounds, I would still very much like to see this
> fixed properly, so please file a bug to track this.
>
> Thanks,
> Robert
>
> >
> > Thanks,
> > Ben
> >
> > On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <ca...@redhat.com>
> > wrote:
> >
> > > Thanks for the information Robert.
> > >
> > > To replicate the issue all I needed was a mongodb (I used a full
> > > replica
> > > set, see my instructions in a previous email about how to get one
> > > going
> > > using podman) and a single process running sling.
> > >
> > > The problem does happen when I do the following:
> > >
> > > 2. Start Sling instance A, wait for it to start
> > > 3. Stop Sling instance A, wait for it to stop
> > > 4. Start Sling instance B - Error
> > >
> > > but let me add more
> > >
> > > 5. Start Sling Instance A again - Success (note I didn't remove the
> > > sling
> > > dir)
> > > 6. Start Sling instance B again - Success (note I didn't remove the
> > > sling
> > > dir)
> > >
> > > this means that even if Sling recreates the sling directory and
> > > fails the
> > > startup, next time it will succeed. Unfortunately we don't have
> > > that luxury
> > > in containers because the sling directory is not persisted.
> > >
> > > I think this is a bug, but I'll keep playing with it a bit to see
> > > if I can
> > > find out more.
> > >
> > > Carlos
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <rombert@apache.org
> > > >
> > > wrote:
> > >
> > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > > > Robert I managed to replicate the issue in a local, non-
> > > > > containerized
> > > > > environment (!!!).
> > > > >
> > > > > The problem seems to be when the database is kept but the
> > > > > 'sling'
> > > > > directory
> > > > > is cleared out across restarts (as it is for us when the
> > > > > container
> > > > > goes
> > > > > away). As I said before this doesn't seem to be a problem with
> > > > > the
> > > > > Sling 11
> > > > > bundles.
> > > > >
> > > > > The first basic solution will be to persist the 'sling'
> > > > > directory
> > > > > across
> > > > > restarts, and I was wondering if this is a bug, or as designed.
> > > >
> > > > I think this should work.
> > > >
> > > > > I also wonder if once persisted, multiple containers could
> > > > > share this
> > > > > directory.
> > > >
> > > > This directory can't be shared, as it holds runtime data related
> > > > to
> > > > Sling. For instance, a bundle that is started in instance A could
> > > > be
> > > > starting on instance B.
> > > >
> > > > There is at least one file ( sling.id ) that holds data that must
> > > > not
> > > > be the same between instances.
> > > >
> > > > So I would advise as marking the directory as container-private
> > > > as a
> > > > first step.
> > > >
> > > > Robert
> > > >
> > > > > Regards,
> > > > >
> > > > > Carlos
> > > > >
> > > > >
> > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > > camunoz@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Robert (and once again I can't stress enough how
> > > > > > grateful I
> > > > > > am for
> > > > > > all your help).
> > > > > >
> > > > > > Right now we deploy our container with the expectation that
> > > > > > the
> > > > > > mongo db
> > > > > > is the only necessary state we need to keep; everything else
> > > > > > is
> > > > > > throwaway.
> > > > > > This means that a totally new container connected to the
> > > > > > mongodb
> > > > > > should
> > > > > > pick up the state and run the same as the first time it was
> > > > > > fired
> > > > > > up. Do
> > > > > > you think this is an incorrect assumption? If so, what are
> > > > > > other
> > > > > > pieces of
> > > > > > state we should be keeping for subsequent restarts?
> > > > > >
> > > > > > This assumption has worked well for us with the current sling
> > > > > > 11
> > > > > > release,
> > > > > > but it seems to break with the more up-to-date bundles.
> > > > > > Perhaps
> > > > > > running
> > > > > > Sling in a container is just not meant to be.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Carlos
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > > > rombert@apache.org
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Carlos,
> > > > > > >
> > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level
> > > > > > > > logs for
> > > > > > > > every
> > > > > > > > bundle? I tried passing a few configuration arguments
> > > > > > > > from the
> > > > > > > > command line
> > > > > > > > but nothing seemed to work.
> > > > > > >
> > > > > > > Try configuring the LogManager to debug at
> > > > > > >
> > > > > > >
> > > > > > >
> > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > > Thanks,
> > > > > > > Robert
> > > > > > >
> > > > > > > > Carlos
> > > > > > > >
> > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > > > > bdelacretaz@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > > > > camunoz@redhat.com>
> > > > > > > > > wrote:
> > > > > > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > > > > > restarting?
> > > > > > > > > > And what
> > > > > > > > > > class could we start looking into to debug if this is
> > > > > > > > > > the
> > > > > > > > > > case?...
> > > > > > > > >
> > > > > > > > > It's not uncommon to see extra restarts of OSGi
> > > > > > > > > components at
> > > > > > > > > startup,
> > > > > > > > > for various reasons.
> > > > > > > > >
> > > > > > > > > The simplest way to detect and log multiple repository
> > > > > > > > > startups
> > > > > > > > > might
> > > > > > > > > be to implement a SlingRepositoryInitializer service
> > > > > > > > > [1]
> > > > > > > > > that's
> > > > > > > > > called
> > > > > > > > > at every startup, or use the logs of an existing one
> > > > > > > > > like the
> > > > > > > > > JCR
> > > > > > > > > RepositoryInitializer [2] if that has anything to
> > > > > > > > > process in
> > > > > > > > > your
> > > > > > > > > system.
> > > > > > > > >
> > > > > > > > > -Bertrand
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > > [2]
> > > > > > > > >
> > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
On Mon, 2020-02-17 at 13:45 -0500, Ben Radey wrote:
> I am following along conceptually - I want to make sure I understand
> what's
> being described.
> 
> Let's say Sling Instance A starts successfully the first time. If we
> restart Sling Instance A, we expect subsequent restarts to also
> succeed,
> without removing the sling directory.
> Now let's say Sling Instance B does NOT start successfully the first
> time.
> Despite that, we expect subsequent restarts to succeed without
> removing the
> sling directory.
> 
> Correct so far?

Yes, correct.

> 
> Assuming yes... what if this is running in k8s, and k8s sees that
> Sling
> Instance B did not start successfully, and kills the pod (removing
> all pod
> resources, including that pod's sling directory) in response?
> Presumably,
> k8s would then start Sling Instance C, which is a fresh instance with
> no
> sling directory. Are we saying we expect C to have a 50/50 chance of
> starting successfully? Or have we observed different behavior?

I think that only the first instance starts successfully. Additional
instances will not start unless they have a Sling directory set up.

I've tested with a third instance, once two instances are up, and it
has the exact same behaviour.

One workaround that I can suggest for a containerized environment is to
use a supervisor script that detects the abnormal startup problem and
restarts Sling, so that it starts up successfully.

Another would be to persist the 'sling' directory as a per-container
volume. Not sure how easy that is with k8s, but maybe you can use a
single ReadWriteMany volume at /sling, and each pod gets their own
${sling.home} at /sling/${containerId} ( assuming that is exposed
through the downward API).

As these are workardounds, I would still very much like to see this
fixed properly, so please file a bug to track this.

Thanks,
Robert

> 
> Thanks,
> Ben
> 
> On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <ca...@redhat.com>
> wrote:
> 
> > Thanks for the information Robert.
> > 
> > To replicate the issue all I needed was a mongodb (I used a full
> > replica
> > set, see my instructions in a previous email about how to get one
> > going
> > using podman) and a single process running sling.
> > 
> > The problem does happen when I do the following:
> > 
> > 2. Start Sling instance A, wait for it to start
> > 3. Stop Sling instance A, wait for it to stop
> > 4. Start Sling instance B - Error
> > 
> > but let me add more
> > 
> > 5. Start Sling Instance A again - Success (note I didn't remove the
> > sling
> > dir)
> > 6. Start Sling instance B again - Success (note I didn't remove the
> > sling
> > dir)
> > 
> > this means that even if Sling recreates the sling directory and
> > fails the
> > startup, next time it will succeed. Unfortunately we don't have
> > that luxury
> > in containers because the sling directory is not persisted.
> > 
> > I think this is a bug, but I'll keep playing with it a bit to see
> > if I can
> > find out more.
> > 
> > Carlos
> > 
> > 
> > 
> > 
> > 
> > 
> > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <rombert@apache.org
> > >
> > wrote:
> > 
> > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > > Robert I managed to replicate the issue in a local, non-
> > > > containerized
> > > > environment (!!!).
> > > > 
> > > > The problem seems to be when the database is kept but the
> > > > 'sling'
> > > > directory
> > > > is cleared out across restarts (as it is for us when the
> > > > container
> > > > goes
> > > > away). As I said before this doesn't seem to be a problem with
> > > > the
> > > > Sling 11
> > > > bundles.
> > > > 
> > > > The first basic solution will be to persist the 'sling'
> > > > directory
> > > > across
> > > > restarts, and I was wondering if this is a bug, or as designed.
> > > 
> > > I think this should work.
> > > 
> > > > I also wonder if once persisted, multiple containers could
> > > > share this
> > > > directory.
> > > 
> > > This directory can't be shared, as it holds runtime data related
> > > to
> > > Sling. For instance, a bundle that is started in instance A could
> > > be
> > > starting on instance B.
> > > 
> > > There is at least one file ( sling.id ) that holds data that must
> > > not
> > > be the same between instances.
> > > 
> > > So I would advise as marking the directory as container-private
> > > as a
> > > first step.
> > > 
> > > Robert
> > > 
> > > > Regards,
> > > > 
> > > > Carlos
> > > > 
> > > > 
> > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <
> > > > camunoz@redhat.com>
> > > > wrote:
> > > > 
> > > > > Thanks Robert (and once again I can't stress enough how
> > > > > grateful I
> > > > > am for
> > > > > all your help).
> > > > > 
> > > > > Right now we deploy our container with the expectation that
> > > > > the
> > > > > mongo db
> > > > > is the only necessary state we need to keep; everything else
> > > > > is
> > > > > throwaway.
> > > > > This means that a totally new container connected to the
> > > > > mongodb
> > > > > should
> > > > > pick up the state and run the same as the first time it was
> > > > > fired
> > > > > up. Do
> > > > > you think this is an incorrect assumption? If so, what are
> > > > > other
> > > > > pieces of
> > > > > state we should be keeping for subsequent restarts?
> > > > > 
> > > > > This assumption has worked well for us with the current sling
> > > > > 11
> > > > > release,
> > > > > but it seems to break with the more up-to-date bundles.
> > > > > Perhaps
> > > > > running
> > > > > Sling in a container is just not meant to be.
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Carlos
> > > > > 
> > > > > 
> > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <
> > > > > rombert@apache.org
> > > > > wrote:
> > > > > 
> > > > > > Hi Carlos,
> > > > > > 
> > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level
> > > > > > > logs for
> > > > > > > every
> > > > > > > bundle? I tried passing a few configuration arguments
> > > > > > > from the
> > > > > > > command line
> > > > > > > but nothing seemed to work.
> > > > > > 
> > > > > > Try configuring the LogManager to debug at
> > > > > > 
> > > > > > 
> > > > > > 
> > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > > > Thanks,
> > > > > > Robert
> > > > > > 
> > > > > > > Carlos
> > > > > > > 
> > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > > > bdelacretaz@apache.org>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > > > camunoz@redhat.com>
> > > > > > > > wrote:
> > > > > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > > > > restarting?
> > > > > > > > > And what
> > > > > > > > > class could we start looking into to debug if this is
> > > > > > > > > the
> > > > > > > > > case?...
> > > > > > > > 
> > > > > > > > It's not uncommon to see extra restarts of OSGi
> > > > > > > > components at
> > > > > > > > startup,
> > > > > > > > for various reasons.
> > > > > > > > 
> > > > > > > > The simplest way to detect and log multiple repository
> > > > > > > > startups
> > > > > > > > might
> > > > > > > > be to implement a SlingRepositoryInitializer service
> > > > > > > > [1]
> > > > > > > > that's
> > > > > > > > called
> > > > > > > > at every startup, or use the logs of an existing one
> > > > > > > > like the
> > > > > > > > JCR
> > > > > > > > RepositoryInitializer [2] if that has anything to
> > > > > > > > process in
> > > > > > > > your
> > > > > > > > system.
> > > > > > > > 
> > > > > > > > -Bertrand
> > > > > > > > 
> > > > > > > > [1]
> > > > > > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > > [2]
> > > > > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Ben Radey <br...@redhat.com>.
I am following along conceptually - I want to make sure I understand what's
being described.

Let's say Sling Instance A starts successfully the first time. If we
restart Sling Instance A, we expect subsequent restarts to also succeed,
without removing the sling directory.
Now let's say Sling Instance B does NOT start successfully the first time.
Despite that, we expect subsequent restarts to succeed without removing the
sling directory.

Correct so far?

Assuming yes... what if this is running in k8s, and k8s sees that Sling
Instance B did not start successfully, and kills the pod (removing all pod
resources, including that pod's sling directory) in response? Presumably,
k8s would then start Sling Instance C, which is a fresh instance with no
sling directory. Are we saying we expect C to have a 50/50 chance of
starting successfully? Or have we observed different behavior?

Thanks,
Ben

On Mon, Feb 17, 2020 at 11:33 AM Carlos Munoz <ca...@redhat.com> wrote:

> Thanks for the information Robert.
>
> To replicate the issue all I needed was a mongodb (I used a full replica
> set, see my instructions in a previous email about how to get one going
> using podman) and a single process running sling.
>
> The problem does happen when I do the following:
>
> 2. Start Sling instance A, wait for it to start
> 3. Stop Sling instance A, wait for it to stop
> 4. Start Sling instance B - Error
>
> but let me add more
>
> 5. Start Sling Instance A again - Success (note I didn't remove the sling
> dir)
> 6. Start Sling instance B again - Success (note I didn't remove the sling
> dir)
>
> this means that even if Sling recreates the sling directory and fails the
> startup, next time it will succeed. Unfortunately we don't have that luxury
> in containers because the sling directory is not persisted.
>
> I think this is a bug, but I'll keep playing with it a bit to see if I can
> find out more.
>
> Carlos
>
>
>
>
>
>
> On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <ro...@apache.org>
> wrote:
>
> > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > > Robert I managed to replicate the issue in a local, non-containerized
> > > environment (!!!).
> > >
> > > The problem seems to be when the database is kept but the 'sling'
> > > directory
> > > is cleared out across restarts (as it is for us when the container
> > > goes
> > > away). As I said before this doesn't seem to be a problem with the
> > > Sling 11
> > > bundles.
> > >
> > > The first basic solution will be to persist the 'sling' directory
> > > across
> > > restarts, and I was wondering if this is a bug, or as designed.
> >
> > I think this should work.
> >
> > >
> > > I also wonder if once persisted, multiple containers could share this
> > > directory.
> >
> > This directory can't be shared, as it holds runtime data related to
> > Sling. For instance, a bundle that is started in instance A could be
> > starting on instance B.
> >
> > There is at least one file ( sling.id ) that holds data that must not
> > be the same between instances.
> >
> > So I would advise as marking the directory as container-private as a
> > first step.
> >
> > Robert
> >
> > >
> > > Regards,
> > >
> > > Carlos
> > >
> > >
> > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > >
> > > > Thanks Robert (and once again I can't stress enough how grateful I
> > > > am for
> > > > all your help).
> > > >
> > > > Right now we deploy our container with the expectation that the
> > > > mongo db
> > > > is the only necessary state we need to keep; everything else is
> > > > throwaway.
> > > > This means that a totally new container connected to the mongodb
> > > > should
> > > > pick up the state and run the same as the first time it was fired
> > > > up. Do
> > > > you think this is an incorrect assumption? If so, what are other
> > > > pieces of
> > > > state we should be keeping for subsequent restarts?
> > > >
> > > > This assumption has worked well for us with the current sling 11
> > > > release,
> > > > but it seems to break with the more up-to-date bundles. Perhaps
> > > > running
> > > > Sling in a container is just not meant to be.
> > > >
> > > > Regards,
> > > >
> > > > Carlos
> > > >
> > > >
> > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <rombert@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Carlos,
> > > > >
> > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level logs for
> > > > > > every
> > > > > > bundle? I tried passing a few configuration arguments from the
> > > > > > command line
> > > > > > but nothing seemed to work.
> > > > >
> > > > > Try configuring the LogManager to debug at
> > > > >
> > > > >
> > > > >
> >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > > >
> > > > > Thanks,
> > > > > Robert
> > > > >
> > > > > > Carlos
> > > > > >
> > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > > bdelacretaz@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > > camunoz@redhat.com>
> > > > > > > wrote:
> > > > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > > > restarting?
> > > > > > > > And what
> > > > > > > > class could we start looking into to debug if this is the
> > > > > > > > case?...
> > > > > > >
> > > > > > > It's not uncommon to see extra restarts of OSGi components at
> > > > > > > startup,
> > > > > > > for various reasons.
> > > > > > >
> > > > > > > The simplest way to detect and log multiple repository
> > > > > > > startups
> > > > > > > might
> > > > > > > be to implement a SlingRepositoryInitializer service [1]
> > > > > > > that's
> > > > > > > called
> > > > > > > at every startup, or use the logs of an existing one like the
> > > > > > > JCR
> > > > > > > RepositoryInitializer [2] if that has anything to process in
> > > > > > > your
> > > > > > > system.
> > > > > > >
> > > > > > > -Bertrand
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > > [2]
> > > > > > >
> > > > >
> >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > > > >
> >
> >
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks for the information Robert.

To replicate the issue all I needed was a mongodb (I used a full replica
set, see my instructions in a previous email about how to get one going
using podman) and a single process running sling.

The problem does happen when I do the following:

2. Start Sling instance A, wait for it to start
3. Stop Sling instance A, wait for it to stop
4. Start Sling instance B - Error

but let me add more

5. Start Sling Instance A again - Success (note I didn't remove the sling
dir)
6. Start Sling instance B again - Success (note I didn't remove the sling
dir)

this means that even if Sling recreates the sling directory and fails the
startup, next time it will succeed. Unfortunately we don't have that luxury
in containers because the sling directory is not persisted.

I think this is a bug, but I'll keep playing with it a bit to see if I can
find out more.

Carlos






On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <ro...@apache.org> wrote:

> On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > Robert I managed to replicate the issue in a local, non-containerized
> > environment (!!!).
> >
> > The problem seems to be when the database is kept but the 'sling'
> > directory
> > is cleared out across restarts (as it is for us when the container
> > goes
> > away). As I said before this doesn't seem to be a problem with the
> > Sling 11
> > bundles.
> >
> > The first basic solution will be to persist the 'sling' directory
> > across
> > restarts, and I was wondering if this is a bug, or as designed.
>
> I think this should work.
>
> >
> > I also wonder if once persisted, multiple containers could share this
> > directory.
>
> This directory can't be shared, as it holds runtime data related to
> Sling. For instance, a bundle that is started in instance A could be
> starting on instance B.
>
> There is at least one file ( sling.id ) that holds data that must not
> be the same between instances.
>
> So I would advise as marking the directory as container-private as a
> first step.
>
> Robert
>
> >
> > Regards,
> >
> > Carlos
> >
> >
> > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <ca...@redhat.com>
> > wrote:
> >
> > > Thanks Robert (and once again I can't stress enough how grateful I
> > > am for
> > > all your help).
> > >
> > > Right now we deploy our container with the expectation that the
> > > mongo db
> > > is the only necessary state we need to keep; everything else is
> > > throwaway.
> > > This means that a totally new container connected to the mongodb
> > > should
> > > pick up the state and run the same as the first time it was fired
> > > up. Do
> > > you think this is an incorrect assumption? If so, what are other
> > > pieces of
> > > state we should be keeping for subsequent restarts?
> > >
> > > This assumption has worked well for us with the current sling 11
> > > release,
> > > but it seems to break with the more up-to-date bundles. Perhaps
> > > running
> > > Sling in a container is just not meant to be.
> > >
> > > Regards,
> > >
> > > Carlos
> > >
> > >
> > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <rombert@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi Carlos,
> > > >
> > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > Thanks Bertrand. How can I run Sling with DEBUG-level logs for
> > > > > every
> > > > > bundle? I tried passing a few configuration arguments from the
> > > > > command line
> > > > > but nothing seemed to work.
> > > >
> > > > Try configuring the LogManager to debug at
> > > >
> > > >
> > > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > >
> > > > Thanks,
> > > > Robert
> > > >
> > > > > Carlos
> > > > >
> > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > bdelacretaz@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > camunoz@redhat.com>
> > > > > > wrote:
> > > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > > restarting?
> > > > > > > And what
> > > > > > > class could we start looking into to debug if this is the
> > > > > > > case?...
> > > > > >
> > > > > > It's not uncommon to see extra restarts of OSGi components at
> > > > > > startup,
> > > > > > for various reasons.
> > > > > >
> > > > > > The simplest way to detect and log multiple repository
> > > > > > startups
> > > > > > might
> > > > > > be to implement a SlingRepositoryInitializer service [1]
> > > > > > that's
> > > > > > called
> > > > > > at every startup, or use the logs of an existing one like the
> > > > > > JCR
> > > > > > RepositoryInitializer [2] if that has anything to process in
> > > > > > your
> > > > > > system.
> > > > > >
> > > > > > -Bertrand
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > [2]
> > > > > >
> > > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > > >
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> Robert I managed to replicate the issue in a local, non-containerized
> environment (!!!).
> 
> The problem seems to be when the database is kept but the 'sling'
> directory
> is cleared out across restarts (as it is for us when the container
> goes
> away). As I said before this doesn't seem to be a problem with the
> Sling 11
> bundles.
> 
> The first basic solution will be to persist the 'sling' directory
> across
> restarts, and I was wondering if this is a bug, or as designed.

I think this should work.

> 
> I also wonder if once persisted, multiple containers could share this
> directory.

This directory can't be shared, as it holds runtime data related to
Sling. For instance, a bundle that is started in instance A could be
starting on instance B.

There is at least one file ( sling.id ) that holds data that must not
be the same between instances.

So I would advise as marking the directory as container-private as a
first step.

Robert

> 
> Regards,
> 
> Carlos
> 
> 
> On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <ca...@redhat.com>
> wrote:
> 
> > Thanks Robert (and once again I can't stress enough how grateful I
> > am for
> > all your help).
> > 
> > Right now we deploy our container with the expectation that the
> > mongo db
> > is the only necessary state we need to keep; everything else is
> > throwaway.
> > This means that a totally new container connected to the mongodb
> > should
> > pick up the state and run the same as the first time it was fired
> > up. Do
> > you think this is an incorrect assumption? If so, what are other
> > pieces of
> > state we should be keeping for subsequent restarts?
> > 
> > This assumption has worked well for us with the current sling 11
> > release,
> > but it seems to break with the more up-to-date bundles. Perhaps
> > running
> > Sling in a container is just not meant to be.
> > 
> > Regards,
> > 
> > Carlos
> > 
> > 
> > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <rombert@apache.org
> > >
> > wrote:
> > 
> > > Hi Carlos,
> > > 
> > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > Thanks Bertrand. How can I run Sling with DEBUG-level logs for
> > > > every
> > > > bundle? I tried passing a few configuration arguments from the
> > > > command line
> > > > but nothing seemed to work.
> > > 
> > > Try configuring the LogManager to debug at
> > > 
> > > 
> > > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > 
> > > Thanks,
> > > Robert
> > > 
> > > > Carlos
> > > > 
> > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > bdelacretaz@apache.org>
> > > > wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > camunoz@redhat.com>
> > > > > wrote:
> > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > restarting?
> > > > > > And what
> > > > > > class could we start looking into to debug if this is the
> > > > > > case?...
> > > > > 
> > > > > It's not uncommon to see extra restarts of OSGi components at
> > > > > startup,
> > > > > for various reasons.
> > > > > 
> > > > > The simplest way to detect and log multiple repository
> > > > > startups
> > > > > might
> > > > > be to implement a SlingRepositoryInitializer service [1]
> > > > > that's
> > > > > called
> > > > > at every startup, or use the logs of an existing one like the
> > > > > JCR
> > > > > RepositoryInitializer [2] if that has anything to process in
> > > > > your
> > > > > system.
> > > > > 
> > > > > -Bertrand
> > > > > 
> > > > > [1]
> > > > > 
> > > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > [2]
> > > > > 
> > > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Robert I managed to replicate the issue in a local, non-containerized
environment (!!!).

The problem seems to be when the database is kept but the 'sling' directory
is cleared out across restarts (as it is for us when the container goes
away). As I said before this doesn't seem to be a problem with the Sling 11
bundles.

The first basic solution will be to persist the 'sling' directory across
restarts, and I was wondering if this is a bug, or as designed.

I also wonder if once persisted, multiple containers could share this
directory.

Regards,

Carlos


On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <ca...@redhat.com> wrote:

> Thanks Robert (and once again I can't stress enough how grateful I am for
> all your help).
>
> Right now we deploy our container with the expectation that the mongo db
> is the only necessary state we need to keep; everything else is throwaway.
> This means that a totally new container connected to the mongodb should
> pick up the state and run the same as the first time it was fired up. Do
> you think this is an incorrect assumption? If so, what are other pieces of
> state we should be keeping for subsequent restarts?
>
> This assumption has worked well for us with the current sling 11 release,
> but it seems to break with the more up-to-date bundles. Perhaps running
> Sling in a container is just not meant to be.
>
> Regards,
>
> Carlos
>
>
> On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <ro...@apache.org>
> wrote:
>
>> Hi Carlos,
>>
>> On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
>> > Thanks Bertrand. How can I run Sling with DEBUG-level logs for every
>> > bundle? I tried passing a few configuration arguments from the
>> > command line
>> > but nothing seemed to work.
>>
>> Try configuring the LogManager to debug at
>>
>>
>> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
>>
>> Thanks,
>> Robert
>>
>> >
>> > Carlos
>> >
>> > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
>> > bdelacretaz@apache.org>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <ca...@redhat.com>
>> > > wrote:
>> > > > ...Is there a reason why the Jcr repository could be restarting?
>> > > > And what
>> > > > class could we start looking into to debug if this is the
>> > > > case?...
>> > >
>> > > It's not uncommon to see extra restarts of OSGi components at
>> > > startup,
>> > > for various reasons.
>> > >
>> > > The simplest way to detect and log multiple repository startups
>> > > might
>> > > be to implement a SlingRepositoryInitializer service [1] that's
>> > > called
>> > > at every startup, or use the logs of an existing one like the JCR
>> > > RepositoryInitializer [2] if that has anything to process in your
>> > > system.
>> > >
>> > > -Bertrand
>> > >
>> > > [1]
>> > >
>> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
>> > > [2]
>> > >
>> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
>> > >
>> > >
>>
>>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
Hi Carlos,

On Fri, 2020-02-14 at 15:17 -0500, Carlos Munoz wrote:
> Thanks Robert (and once again I can't stress enough how grateful I am
> for
> all your help).
> 
> Right now we deploy our container with the expectation that the mongo
> db is
> the only necessary state we need to keep; everything else is
> throwaway.
> This means that a totally new container connected to the mongodb
> should
> pick up the state and run the same as the first time it was fired up.
> Do
> you think this is an incorrect assumption? If so, what are other
> pieces of
> state we should be keeping for subsequent restarts?

A MongoDB deployment should be able to support new Sling clients
connecting to it, and those won't have a 'sling' directory. Does the
problem happen if you:

1. Deploy MongoDB
2. Start Sling instance A, wait for it to start
3. Stop Sling instance A, wait for it to stop
4. Start Sling instance B

?

I'll try and replicate the setup from your previous email, but that may
take a while more, sorry.

> 
> This assumption has worked well for us with the current sling 11
> release,
> but it seems to break with the more up-to-date bundles. Perhaps
> running
> Sling in a container is just not meant to be.

Actually we've been doing quite a lot of that :-) So I definitely think
there is a way of doing that.

Thanks,
Robert

> 
> Regards,
> 
> Carlos
> 
> 
> On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > Hi Carlos,
> > 
> > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > Thanks Bertrand. How can I run Sling with DEBUG-level logs for
> > > every
> > > bundle? I tried passing a few configuration arguments from the
> > > command line
> > > but nothing seemed to work.
> > 
> > Try configuring the LogManager to debug at
> > 
> > 
> > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > 
> > Thanks,
> > Robert
> > 
> > > Carlos
> > > 
> > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > bdelacretaz@apache.org>
> > > wrote:
> > > 
> > > > Hi,
> > > > 
> > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > camunoz@redhat.com>
> > > > wrote:
> > > > > ...Is there a reason why the Jcr repository could be
> > > > > restarting?
> > > > > And what
> > > > > class could we start looking into to debug if this is the
> > > > > case?...
> > > > 
> > > > It's not uncommon to see extra restarts of OSGi components at
> > > > startup,
> > > > for various reasons.
> > > > 
> > > > The simplest way to detect and log multiple repository startups
> > > > might
> > > > be to implement a SlingRepositoryInitializer service [1] that's
> > > > called
> > > > at every startup, or use the logs of an existing one like the
> > > > JCR
> > > > RepositoryInitializer [2] if that has anything to process in
> > > > your
> > > > system.
> > > > 
> > > > -Bertrand
> > > > 
> > > > [1]
> > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > [2]
> > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks Robert (and once again I can't stress enough how grateful I am for
all your help).

Right now we deploy our container with the expectation that the mongo db is
the only necessary state we need to keep; everything else is throwaway.
This means that a totally new container connected to the mongodb should
pick up the state and run the same as the first time it was fired up. Do
you think this is an incorrect assumption? If so, what are other pieces of
state we should be keeping for subsequent restarts?

This assumption has worked well for us with the current sling 11 release,
but it seems to break with the more up-to-date bundles. Perhaps running
Sling in a container is just not meant to be.

Regards,

Carlos


On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <ro...@apache.org> wrote:

> Hi Carlos,
>
> On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > Thanks Bertrand. How can I run Sling with DEBUG-level logs for every
> > bundle? I tried passing a few configuration arguments from the
> > command line
> > but nothing seemed to work.
>
> Try configuring the LogManager to debug at
>
>
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
>
> Thanks,
> Robert
>
> >
> > Carlos
> >
> > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > bdelacretaz@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > > > ...Is there a reason why the Jcr repository could be restarting?
> > > > And what
> > > > class could we start looking into to debug if this is the
> > > > case?...
> > >
> > > It's not uncommon to see extra restarts of OSGi components at
> > > startup,
> > > for various reasons.
> > >
> > > The simplest way to detect and log multiple repository startups
> > > might
> > > be to implement a SlingRepositoryInitializer service [1] that's
> > > called
> > > at every startup, or use the logs of an existing one like the JCR
> > > RepositoryInitializer [2] if that has anything to process in your
> > > system.
> > >
> > > -Bertrand
> > >
> > > [1]
> > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > [2]
> > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > >
> > >
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
Hi Carlos,

On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> Thanks Bertrand. How can I run Sling with DEBUG-level logs for every
> bundle? I tried passing a few configuration arguments from the
> command line
> but nothing seemed to work.

Try configuring the LogManager to debug at 

  https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138

Thanks,
Robert

> 
> Carlos
> 
> On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> bdelacretaz@apache.org>
> wrote:
> 
> > Hi,
> > 
> > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <ca...@redhat.com>
> > wrote:
> > > ...Is there a reason why the Jcr repository could be restarting?
> > > And what
> > > class could we start looking into to debug if this is the
> > > case?...
> > 
> > It's not uncommon to see extra restarts of OSGi components at
> > startup,
> > for various reasons.
> > 
> > The simplest way to detect and log multiple repository startups
> > might
> > be to implement a SlingRepositoryInitializer service [1] that's
> > called
> > at every startup, or use the logs of an existing one like the JCR
> > RepositoryInitializer [2] if that has anything to process in your
> > system.
> > 
> > -Bertrand
> > 
> > [1]
> > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > [2]
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > 
> > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks Bertrand. How can I run Sling with DEBUG-level logs for every
bundle? I tried passing a few configuration arguments from the command line
but nothing seemed to work.

Carlos

On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <bd...@apache.org>
wrote:

> Hi,
>
> On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <ca...@redhat.com> wrote:
> > ...Is there a reason why the Jcr repository could be restarting? And what
> > class could we start looking into to debug if this is the case?...
>
> It's not uncommon to see extra restarts of OSGi components at startup,
> for various reasons.
>
> The simplest way to detect and log multiple repository startups might
> be to implement a SlingRepositoryInitializer service [1] that's called
> at every startup, or use the logs of an existing one like the JCR
> RepositoryInitializer [2] if that has anything to process in your
> system.
>
> -Bertrand
>
> [1]
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> [2]
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <ca...@redhat.com> wrote:
> ...Is there a reason why the Jcr repository could be restarting? And what
> class could we start looking into to debug if this is the case?...

It's not uncommon to see extra restarts of OSGi components at startup,
for various reasons.

The simplest way to detect and log multiple repository startups might
be to implement a SlingRepositoryInitializer service [1] that's called
at every startup, or use the logs of an existing one like the JCR
RepositoryInitializer [2] if that has anything to process in your
system.

-Bertrand

[1] https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
[2] https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Here are the steps we follow to create a replica set in podman. We use a
pod to hold all the mongo instances, which is a concept not likely present
in docker:

1. Create a pod



podman pod create --name mongo-replica-set -p 30001 -p 30002 -p 30003



2. Create 3 mongo containers in the pod (the minimum required for a replica
set).

Each container will expose a different port for mongo (30001-3)



podman run -d --pod mongo-replica-set --name mongo1 mongo:3.6 mongod
--port 30001 --replSet mongo-repl-set
podman run -d --pod mongo-replica-set --name mongo2 mongo:3.6 mongod
--port 30002 --replSet mongo-repl-set
podman run -d --pod mongo-replica-set --name mongo3 mongo:3.6 mongod
--port 30003 --replSet mongo-repl-set

3. To verify the previous step, connect to mongo on one of the specified
ports (this assumes you have a local installation of the mongo client)

mongo --port 30001



4. Once connected, create a connection to a database (in this case, the
'test' database but it may be the 'pantheondb' one which is what's used in
production)

> db = (new Mongo('localhost:30001')).getDB('test')



5. To tie all three containers in a replica set, create a configuration in
the form of a json object (this is still happening in the mongo shell)



> config = {
   "_id" : "mongo-repl-set",
   "members" : [
   {
     "_id" : 0,
     "host" : "localhost:30001"
   },
   {
     "_id" : 1,
     "host" : "localhost:30002"
   },
   {
     "_id" : 2,
     "host" : "localhost:30003"
   }]
}





6. Use the configuration object to initialize the replica set

> rs.initiate(config)

You should see a confirmation message of this type:

{
  { "ok" : 1,
    "operationTime" : Timestamp(1581368609, 1),
    "$clusterTime" : {
      "clusterTime" : Timestamp(1581368609, 1),
      "signature" : {
        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
        "keyId" : NumberLong(0)
      }
    }
  }
}



All members of the replica set should now be available via localhost:port
either from within the pod, or exposed to the local machine via the pod
created in step 1.

Is there a reason why the Jcr repository could be restarting? And what
class could we start looking into to debug if this is the case? I have
unsuccessfully tried to increase the logging level in Sling but nothing
I've done has worked.

Carlos

On Tue, Feb 11, 2020 at 4:43 AM Robert Munteanu <ro...@apache.org> wrote:

> On Mon, 2020-02-10 at 17:16 -0500, Carlos Munoz wrote:
> > Thanks Sergiu, I will give that a shot (removing that configuration
> > item I
> > mean).
> >
> > I actually managed to replicate some of the weird symptoms locally.
> > It
> > required me to set up a local mongo replica set. (I used podman, so
> > let me
> > know if you need the steps I followed to do this). The first time I
> > ran
> > sling against my local database it worked, but after running the
> > second
> > time it froze up before our bundle could even come up.
>
> Steps to reproduce are always great :-)
>
> I don't have podman installed, but IIRC commands podman and docker CLI
> commands should be compatible, so feel free to add those.
>
> >
> > I did see a lot of entries like these, where it seems to registering
> > and
> > subsequently unregistering the same service(s) multiple times:
>
> (snip)
>
> Sounds like something low-level is restarting, causing lots of other
> services to restart. Maybe the repository itself?
>
> Robert
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
On Mon, 2020-02-10 at 17:16 -0500, Carlos Munoz wrote:
> Thanks Sergiu, I will give that a shot (removing that configuration
> item I
> mean).
> 
> I actually managed to replicate some of the weird symptoms locally.
> It
> required me to set up a local mongo replica set. (I used podman, so
> let me
> know if you need the steps I followed to do this). The first time I
> ran
> sling against my local database it worked, but after running the
> second
> time it froze up before our bundle could even come up.

Steps to reproduce are always great :-)

I don't have podman installed, but IIRC commands podman and docker CLI
commands should be compatible, so feel free to add those.

> 
> I did see a lot of entries like these, where it seems to registering
> and
> subsequently unregistering the same service(s) multiple times:

(snip)

Sounds like something low-level is restarting, causing lots of other
services to restart. Maybe the repository itself?

Robert


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks Sergiu, I will give that a shot (removing that configuration item I
mean).

I actually managed to replicate some of the weird symptoms locally. It
required me to set up a local mongo replica set. (I used podman, so let me
know if you need the steps I followed to do this). The first time I ran
sling against my local database it worked, but after running the second
time it froze up before our bundle could even come up.

I did see a lot of entries like these, where it seems to registering and
subsequently unregistering the same service(s) multiple times:

[PANTHEON] 16:21:00.907 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,953,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:00.909 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,954,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:00.911 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,951,
[com.redhat.pantheon.extension.Events]] ServiceEvent REGISTERED
[PANTHEON] 16:21:00.966 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,953,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:00.969 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,954,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:00.971 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,951,
[com.redhat.pantheon.extension.Events]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:01.202 INFO  c.r.p.c.GlobalConfig - Initializing template
directories at /tmp/templates3435596101097019968
[PANTHEON] 16:21:01.212 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1002,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:01.214 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1003,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:01.215 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1001,
[com.redhat.pantheon.extension.Events]] ServiceEvent REGISTERED
[PANTHEON] 16:21:01.570 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1002,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:01.571 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1003,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:01.573 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1001,
[com.redhat.pantheon.extension.Events]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:01.799 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1101,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:01.801 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1102,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:01.802 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1100,
[com.redhat.pantheon.extension.Events]] ServiceEvent REGISTERED
[PANTHEON] 16:21:02.213 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1101,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:02.215 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1102,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:02.216 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1100,
[com.redhat.pantheon.extension.Events]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:02.438 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1206,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:02.440 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1207,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:02.441 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1205,
[com.redhat.pantheon.extension.Events]] ServiceEvent REGISTERED
[PANTHEON] 16:21:02.676 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1206,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:02.678 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1207,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:02.679 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1205,
[com.redhat.pantheon.extension.Events]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:02.922 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1305,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:02.924 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1306,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:02.925 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1304,
[com.redhat.pantheon.extension.Events]] ServiceEvent REGISTERED
[PANTHEON] 16:21:03.232 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1305,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:03.233 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1306,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:03.234 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1304,
[com.redhat.pantheon.extension.Events]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:03.462 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1404,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:03.463 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1405,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent REGISTERED
[PANTHEON] 16:21:03.465 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1403,
[com.redhat.pantheon.extension.Events]] ServiceEvent REGISTERED
[PANTHEON] 16:21:03.755 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.PublishDraftVersion,1404,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:03.757 INFO  c.r.pantheon - Service
[com.redhat.pantheon.servlet.module.UnpublishVersion,1405,
[org.apache.sling.servlets.post.PostOperation]] ServiceEvent UNREGISTERING
[PANTHEON] 16:21:03.758 INFO  c.r.pantheon - Service
[com.redhat.pantheon.extension.Events,1403,
[com.redhat.pantheon.extension.Events]] ServiceEvent UNREGISTERING

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Sergiu Dumitriu <sd...@apache.org>.
This provisioning snippet triggered it for me, but in a more customized
instance, so I'm not sure if it is enough by itself to cause issues:

[configurations]
    org.apache.sling.auth.form.FormAuthenticationHandler
        form.login.form="/login"


On 2/10/20 4:48 AM, Robert Munteanu wrote:
> I tried to reproduce this but I could not, unfortunately. What I did
> was
> 
> 1. Build the latest Sling Starter locally
> 2. Started up MongoDB
> 
> $ docker run --name mongo-sling -p 27017:27017 -d mongo:3.6
> 
> 3. Started up Sling
> $ java -jar target/org.apache.sling.starter-12-SNAPSHOT.jar -Dsling.run.modes=oak_mongo
> 
> 4. Waited for Sling to start, clicked around a bit and then shut it
> down with Ctrl-C.
> 
> 5. I started up Sling again, with the same command as the one from Step
> 3. 
> 
> Sling started up successfully, no errors in the log.
> 
> Do you have some simplified steps to reproduce I can try? Sharing a k8s
> resource file is fine as well, as long as it is self-contained and I
> can `kubectl apply -f` it and then follow some directions to get Sling
> broken.
> 
> Thanks,
> Robert
> 
> 
> On Fri, 2020-02-07 at 16:58 -0500, Carlos Munoz wrote:
>> Thanks Robert. We tried ensuring only a single Sling pod was hitting
>> the
>> database at one time with some strange results:
>>
>> The first time it runs (against an empty database) everything goes
>> well:
>> the database is populated and the pod comes up with no issues.
>>
>> We then bring this pod down, and then try to bring the same exact one
>> up
>> again with the original exception popping up again:
>>
>> 29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup
>> Thread #4]
>> ERROR: Bundle '160' EventDispatcher: Error during dispatch.
>> (org.apache.sling.api.SlingException: Can't create the JCR event
>> listener.)
>> org.apache.sling.api.SlingException: Can't create the JCR event
>> listener.
>> ...
>> ...
>> Caused by: javax.jcr.LoginException: Can neither derive user name nor
>> principal names for bundle org.apache.sling.jcr.resource [154] and
>> sub
>> service observation
>> at
>> org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(Abstr
>> actSlingRepository2.java:387)
>>
>> I wonder if the sling pod is leaving the database in an unusable
>> state when
>> being brought down.
>>
>> Regards,
>>
>> Carlos
>>
>> On Thu, Feb 6, 2020 at 4:11 AM Robert Munteanu <ro...@apache.org>
>> wrote:
>>
>>> On Wed, 2020-02-05 at 21:17 -0500, Carlos Munoz wrote:
>>>> Hi all,
>>>>
>>>> I think I have a theory for our issues here, and it may have to
>>>> do
>>>> with the
>>>> fact that we are running on a heavily containerized environment
>>>> (kubernetes). I wanted to consult with the community experts to
>>>> see
>>>> what
>>>> you thought.
>>>>
>>>> The way our container platform works on an update is that it will
>>>> try
>>>> to
>>>> bring up a new container with sling (and our application) against
>>>> the
>>>> same
>>>> mongo database that an original (and still running) container is
>>>> running
>>>> against. Now this works fine when the only thing being updated is
>>>> our
>>>> application bundle, but it starts encountering problems when
>>>> several
>>>> other
>>>> bundles and configurations are being updated (some removed, some
>>>> added,
>>>> some updated). I *think* the core of the problem here is that the
>>>> bundles
>>>> and configurations are all stored in the database itself, and two
>>>> containers with potentially different bundle versions and
>>>> configurations
>>>> are attempting to use it simultaneously.
>>>
>>> That is a pretty good guess I'd say :-)
>>>
>>> I did see some similar problems when using Sling for development
>>> purposes on k8s. I never went to production with it, but for my own
>>> purposes it was enough to ensure that only one Sling pod was
>>> starting
>>> up at a time. Maybe you can try that as well?
>>>
>>> A more involved solution would be to use the CompositeNodeStore
>>> [1],
>>> which is designed to separate the storage of /libs and /apps from
>>> the
>>> rest of the repository. So for instance you'd have /libs and /apps
>>> stored on a local segment store for each pod, and the rest of the
>>> content in Mongo.
>>>
>>> Unfortunately there is very little documentation and no tooling
>>> around
>>> it available, so that makes it a difficult proposition.
>>>
>>> Thanks,
>>> Robert
>>>
>>>
>>> [1]: 
>>> https://jackrabbit.apache.org/oak/docs/nodestore/compositens.html
>>>
>>>> If I am right, then our core problem to figure out is how to
>>>> upgrade
>>>> a
>>>> database from one sling version to the next.
>>>>
>>>> Let me know what you all think.
>>>>
>>>> Regards,
>>>>
>>>> Carlos
>>>>
>>>> On Tue, Feb 4, 2020 at 7:06 AM Carlos Munoz <ca...@redhat.com>
>>>> wrote:
>>>>
>>>>> Thanks Bertrand! I will continue my fact finding mission here
>>>>> :)
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos
>>>>>
>>>>> On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz <
>>>>> bdelacretaz@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <
>>>>>> camunoz@redhat.com>
>>>>>> wrote:
>>>>>>> ...do configurations from the
>>>>>>> repoinit files get installed in a specific order with
>>>>>>> relation
>>>>>>> to the
>>>>>>> artifacts?...
>>>>>>
>>>>>> The repoinit configs are applied by a single
>>>>>> SlingRepositoryInitializer [1] service which is implemented
>>>>>> by
>>>>>> org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].
>>>>>>
>>>>>> The execution order of the SlingRepositoryInitializer
>>>>>> services is
>>>>>> based on their service rankings [4] and the
>>>>>> RepositoryInitializer
>>>>>> processes its configurations in the order in which they are
>>>>>> provided
>>>>>> by the OSGi framework, sequentially.
>>>>>>
>>>>>> All this happens before the SlingRepository service is made
>>>>>> available [3]
>>>>>>
>>>>>> The logs should help understand what's going on but IIRC it
>>>>>> all
>>>>>> happens in a single thread.
>>>>>>
>>>>>> -Bertrand
>>>>>>
>>>>>> [1]
>>>>>>
>>> https://sling.apache.org/documentation/bundles/repository-initialization.html
>>>>>> [2]
>>>>>>
>>> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
>>>>>> [3]
>>>>>>
>>> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
>>>>>> [4]
>>>>>>
>>> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581
>>>>>>
> 


-- 
Sergiu Dumitriu
http://purl.org/net/sergiu

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
I tried to reproduce this but I could not, unfortunately. What I did
was

1. Build the latest Sling Starter locally
2. Started up MongoDB

$ docker run --name mongo-sling -p 27017:27017 -d mongo:3.6

3. Started up Sling
$ java -jar target/org.apache.sling.starter-12-SNAPSHOT.jar -Dsling.run.modes=oak_mongo

4. Waited for Sling to start, clicked around a bit and then shut it
down with Ctrl-C.

5. I started up Sling again, with the same command as the one from Step
3. 

Sling started up successfully, no errors in the log.

Do you have some simplified steps to reproduce I can try? Sharing a k8s
resource file is fine as well, as long as it is self-contained and I
can `kubectl apply -f` it and then follow some directions to get Sling
broken.

Thanks,
Robert


On Fri, 2020-02-07 at 16:58 -0500, Carlos Munoz wrote:
> Thanks Robert. We tried ensuring only a single Sling pod was hitting
> the
> database at one time with some strange results:
> 
> The first time it runs (against an empty database) everything goes
> well:
> the database is populated and the pod comes up with no issues.
> 
> We then bring this pod down, and then try to bring the same exact one
> up
> again with the original exception popping up again:
> 
> 29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup
> Thread #4]
> ERROR: Bundle '160' EventDispatcher: Error during dispatch.
> (org.apache.sling.api.SlingException: Can't create the JCR event
> listener.)
> org.apache.sling.api.SlingException: Can't create the JCR event
> listener.
> ...
> ...
> Caused by: javax.jcr.LoginException: Can neither derive user name nor
> principal names for bundle org.apache.sling.jcr.resource [154] and
> sub
> service observation
> at
> org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(Abstr
> actSlingRepository2.java:387)
> 
> I wonder if the sling pod is leaving the database in an unusable
> state when
> being brought down.
> 
> Regards,
> 
> Carlos
> 
> On Thu, Feb 6, 2020 at 4:11 AM Robert Munteanu <ro...@apache.org>
> wrote:
> 
> > On Wed, 2020-02-05 at 21:17 -0500, Carlos Munoz wrote:
> > > Hi all,
> > > 
> > > I think I have a theory for our issues here, and it may have to
> > > do
> > > with the
> > > fact that we are running on a heavily containerized environment
> > > (kubernetes). I wanted to consult with the community experts to
> > > see
> > > what
> > > you thought.
> > > 
> > > The way our container platform works on an update is that it will
> > > try
> > > to
> > > bring up a new container with sling (and our application) against
> > > the
> > > same
> > > mongo database that an original (and still running) container is
> > > running
> > > against. Now this works fine when the only thing being updated is
> > > our
> > > application bundle, but it starts encountering problems when
> > > several
> > > other
> > > bundles and configurations are being updated (some removed, some
> > > added,
> > > some updated). I *think* the core of the problem here is that the
> > > bundles
> > > and configurations are all stored in the database itself, and two
> > > containers with potentially different bundle versions and
> > > configurations
> > > are attempting to use it simultaneously.
> > 
> > That is a pretty good guess I'd say :-)
> > 
> > I did see some similar problems when using Sling for development
> > purposes on k8s. I never went to production with it, but for my own
> > purposes it was enough to ensure that only one Sling pod was
> > starting
> > up at a time. Maybe you can try that as well?
> > 
> > A more involved solution would be to use the CompositeNodeStore
> > [1],
> > which is designed to separate the storage of /libs and /apps from
> > the
> > rest of the repository. So for instance you'd have /libs and /apps
> > stored on a local segment store for each pod, and the rest of the
> > content in Mongo.
> > 
> > Unfortunately there is very little documentation and no tooling
> > around
> > it available, so that makes it a difficult proposition.
> > 
> > Thanks,
> > Robert
> > 
> > 
> > [1]: 
> > https://jackrabbit.apache.org/oak/docs/nodestore/compositens.html
> > 
> > > If I am right, then our core problem to figure out is how to
> > > upgrade
> > > a
> > > database from one sling version to the next.
> > > 
> > > Let me know what you all think.
> > > 
> > > Regards,
> > > 
> > > Carlos
> > > 
> > > On Tue, Feb 4, 2020 at 7:06 AM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > > 
> > > > Thanks Bertrand! I will continue my fact finding mission here
> > > > :)
> > > > 
> > > > Regards,
> > > > 
> > > > Carlos
> > > > 
> > > > On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz <
> > > > bdelacretaz@apache.org>
> > > > wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <
> > > > > camunoz@redhat.com>
> > > > > wrote:
> > > > > > ...do configurations from the
> > > > > > repoinit files get installed in a specific order with
> > > > > > relation
> > > > > > to the
> > > > > > artifacts?...
> > > > > 
> > > > > The repoinit configs are applied by a single
> > > > > SlingRepositoryInitializer [1] service which is implemented
> > > > > by
> > > > > org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].
> > > > > 
> > > > > The execution order of the SlingRepositoryInitializer
> > > > > services is
> > > > > based on their service rankings [4] and the
> > > > > RepositoryInitializer
> > > > > processes its configurations in the order in which they are
> > > > > provided
> > > > > by the OSGi framework, sequentially.
> > > > > 
> > > > > All this happens before the SlingRepository service is made
> > > > > available [3]
> > > > > 
> > > > > The logs should help understand what's going on but IIRC it
> > > > > all
> > > > > happens in a single thread.
> > > > > 
> > > > > -Bertrand
> > > > > 
> > > > > [1]
> > > > > 
> > https://sling.apache.org/documentation/bundles/repository-initialization.html
> > > > > [2]
> > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
> > > > > [3]
> > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
> > > > > [4]
> > > > > 
> > https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581
> > > > > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks Robert. We tried ensuring only a single Sling pod was hitting the
database at one time with some strange results:

The first time it runs (against an empty database) everything goes well:
the database is populated and the pod comes up with no issues.

We then bring this pod down, and then try to bring the same exact one up
again with the original exception popping up again:

29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup Thread #4]
ERROR: Bundle '160' EventDispatcher: Error during dispatch.
(org.apache.sling.api.SlingException: Can't create the JCR event listener.)
org.apache.sling.api.SlingException: Can't create the JCR event listener.
...
...
Caused by: javax.jcr.LoginException: Can neither derive user name nor
principal names for bundle org.apache.sling.jcr.resource [154] and sub
service observation
at
org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(AbstractSlingRepository2.java:387)

I wonder if the sling pod is leaving the database in an unusable state when
being brought down.

Regards,

Carlos

On Thu, Feb 6, 2020 at 4:11 AM Robert Munteanu <ro...@apache.org> wrote:

> On Wed, 2020-02-05 at 21:17 -0500, Carlos Munoz wrote:
> > Hi all,
> >
> > I think I have a theory for our issues here, and it may have to do
> > with the
> > fact that we are running on a heavily containerized environment
> > (kubernetes). I wanted to consult with the community experts to see
> > what
> > you thought.
> >
> > The way our container platform works on an update is that it will try
> > to
> > bring up a new container with sling (and our application) against the
> > same
> > mongo database that an original (and still running) container is
> > running
> > against. Now this works fine when the only thing being updated is our
> > application bundle, but it starts encountering problems when several
> > other
> > bundles and configurations are being updated (some removed, some
> > added,
> > some updated). I *think* the core of the problem here is that the
> > bundles
> > and configurations are all stored in the database itself, and two
> > containers with potentially different bundle versions and
> > configurations
> > are attempting to use it simultaneously.
>
> That is a pretty good guess I'd say :-)
>
> I did see some similar problems when using Sling for development
> purposes on k8s. I never went to production with it, but for my own
> purposes it was enough to ensure that only one Sling pod was starting
> up at a time. Maybe you can try that as well?
>
> A more involved solution would be to use the CompositeNodeStore [1],
> which is designed to separate the storage of /libs and /apps from the
> rest of the repository. So for instance you'd have /libs and /apps
> stored on a local segment store for each pod, and the rest of the
> content in Mongo.
>
> Unfortunately there is very little documentation and no tooling around
> it available, so that makes it a difficult proposition.
>
> Thanks,
> Robert
>
>
> [1]: https://jackrabbit.apache.org/oak/docs/nodestore/compositens.html
>
> >
> > If I am right, then our core problem to figure out is how to upgrade
> > a
> > database from one sling version to the next.
> >
> > Let me know what you all think.
> >
> > Regards,
> >
> > Carlos
> >
> > On Tue, Feb 4, 2020 at 7:06 AM Carlos Munoz <ca...@redhat.com>
> > wrote:
> >
> > > Thanks Bertrand! I will continue my fact finding mission here :)
> > >
> > > Regards,
> > >
> > > Carlos
> > >
> > > On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz <
> > > bdelacretaz@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <ca...@redhat.com>
> > > > wrote:
> > > > > ...do configurations from the
> > > > > repoinit files get installed in a specific order with relation
> > > > > to the
> > > > > artifacts?...
> > > >
> > > > The repoinit configs are applied by a single
> > > > SlingRepositoryInitializer [1] service which is implemented by
> > > > org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].
> > > >
> > > > The execution order of the SlingRepositoryInitializer services is
> > > > based on their service rankings [4] and the RepositoryInitializer
> > > > processes its configurations in the order in which they are
> > > > provided
> > > > by the OSGi framework, sequentially.
> > > >
> > > > All this happens before the SlingRepository service is made
> > > > available [3]
> > > >
> > > > The logs should help understand what's going on but IIRC it all
> > > > happens in a single thread.
> > > >
> > > > -Bertrand
> > > >
> > > > [1]
> > > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html
> > > > [2]
> > > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
> > > > [3]
> > > >
> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
> > > > [4]
> > > >
> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581
> > > >
> > > >
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Robert Munteanu <ro...@apache.org>.
On Wed, 2020-02-05 at 21:17 -0500, Carlos Munoz wrote:
> Hi all,
> 
> I think I have a theory for our issues here, and it may have to do
> with the
> fact that we are running on a heavily containerized environment
> (kubernetes). I wanted to consult with the community experts to see
> what
> you thought.
> 
> The way our container platform works on an update is that it will try
> to
> bring up a new container with sling (and our application) against the
> same
> mongo database that an original (and still running) container is
> running
> against. Now this works fine when the only thing being updated is our
> application bundle, but it starts encountering problems when several
> other
> bundles and configurations are being updated (some removed, some
> added,
> some updated). I *think* the core of the problem here is that the
> bundles
> and configurations are all stored in the database itself, and two
> containers with potentially different bundle versions and
> configurations
> are attempting to use it simultaneously.

That is a pretty good guess I'd say :-)

I did see some similar problems when using Sling for development
purposes on k8s. I never went to production with it, but for my own
purposes it was enough to ensure that only one Sling pod was starting
up at a time. Maybe you can try that as well?

A more involved solution would be to use the CompositeNodeStore [1],
which is designed to separate the storage of /libs and /apps from the
rest of the repository. So for instance you'd have /libs and /apps
stored on a local segment store for each pod, and the rest of the
content in Mongo.

Unfortunately there is very little documentation and no tooling around
it available, so that makes it a difficult proposition.

Thanks,
Robert


[1]: https://jackrabbit.apache.org/oak/docs/nodestore/compositens.html

> 
> If I am right, then our core problem to figure out is how to upgrade
> a
> database from one sling version to the next.
> 
> Let me know what you all think.
> 
> Regards,
> 
> Carlos
> 
> On Tue, Feb 4, 2020 at 7:06 AM Carlos Munoz <ca...@redhat.com>
> wrote:
> 
> > Thanks Bertrand! I will continue my fact finding mission here :)
> > 
> > Regards,
> > 
> > Carlos
> > 
> > On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz <
> > bdelacretaz@apache.org>
> > wrote:
> > 
> > > Hi,
> > > 
> > > On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > > > ...do configurations from the
> > > > repoinit files get installed in a specific order with relation
> > > > to the
> > > > artifacts?...
> > > 
> > > The repoinit configs are applied by a single
> > > SlingRepositoryInitializer [1] service which is implemented by
> > > org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].
> > > 
> > > The execution order of the SlingRepositoryInitializer services is
> > > based on their service rankings [4] and the RepositoryInitializer
> > > processes its configurations in the order in which they are
> > > provided
> > > by the OSGi framework, sequentially.
> > > 
> > > All this happens before the SlingRepository service is made
> > > available [3]
> > > 
> > > The logs should help understand what's going on but IIRC it all
> > > happens in a single thread.
> > > 
> > > -Bertrand
> > > 
> > > [1]
> > > https://sling.apache.org/documentation/bundles/repository-initialization.html
> > > [2]
> > > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
> > > [3]
> > > https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
> > > [4]
> > > https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581
> > > 
> > > 


Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Hi all,

I think I have a theory for our issues here, and it may have to do with the
fact that we are running on a heavily containerized environment
(kubernetes). I wanted to consult with the community experts to see what
you thought.

The way our container platform works on an update is that it will try to
bring up a new container with sling (and our application) against the same
mongo database that an original (and still running) container is running
against. Now this works fine when the only thing being updated is our
application bundle, but it starts encountering problems when several other
bundles and configurations are being updated (some removed, some added,
some updated). I *think* the core of the problem here is that the bundles
and configurations are all stored in the database itself, and two
containers with potentially different bundle versions and configurations
are attempting to use it simultaneously.

If I am right, then our core problem to figure out is how to upgrade a
database from one sling version to the next.

Let me know what you all think.

Regards,

Carlos

On Tue, Feb 4, 2020 at 7:06 AM Carlos Munoz <ca...@redhat.com> wrote:

> Thanks Bertrand! I will continue my fact finding mission here :)
>
> Regards,
>
> Carlos
>
> On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz <bd...@apache.org>
> wrote:
>
>> Hi,
>>
>> On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <ca...@redhat.com> wrote:
>> > ...do configurations from the
>> > repoinit files get installed in a specific order with relation to the
>> > artifacts?...
>>
>> The repoinit configs are applied by a single
>> SlingRepositoryInitializer [1] service which is implemented by
>> org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].
>>
>> The execution order of the SlingRepositoryInitializer services is
>> based on their service rankings [4] and the RepositoryInitializer
>> processes its configurations in the order in which they are provided
>> by the OSGi framework, sequentially.
>>
>> All this happens before the SlingRepository service is made available [3]
>>
>> The logs should help understand what's going on but IIRC it all
>> happens in a single thread.
>>
>> -Bertrand
>>
>> [1]
>> https://sling.apache.org/documentation/bundles/repository-initialization.html
>> [2]
>> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
>> [3]
>> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
>> [4]
>> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581
>>
>>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Thanks Bertrand! I will continue my fact finding mission here :)

Regards,

Carlos

On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz <bd...@apache.org>
wrote:

> Hi,
>
> On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <ca...@redhat.com> wrote:
> > ...do configurations from the
> > repoinit files get installed in a specific order with relation to the
> > artifacts?...
>
> The repoinit configs are applied by a single
> SlingRepositoryInitializer [1] service which is implemented by
> org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].
>
> The execution order of the SlingRepositoryInitializer services is
> based on their service rankings [4] and the RepositoryInitializer
> processes its configurations in the order in which they are provided
> by the OSGi framework, sequentially.
>
> All this happens before the SlingRepository service is made available [3]
>
> The logs should help understand what's going on but IIRC it all
> happens in a single thread.
>
> -Bertrand
>
> [1]
> https://sling.apache.org/documentation/bundles/repository-initialization.html
> [2]
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
> [3]
> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
> [4]
> https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <ca...@redhat.com> wrote:
> ...do configurations from the
> repoinit files get installed in a specific order with relation to the
> artifacts?...

The repoinit configs are applied by a single
SlingRepositoryInitializer [1] service which is implemented by
org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2].

The execution order of the SlingRepositoryInitializer services is
based on their service rankings [4] and the RepositoryInitializer
processes its configurations in the order in which they are provided
by the OSGi framework, sequentially.

All this happens before the SlingRepository service is made available [3]

The logs should help understand what's going on but IIRC it all
happens in a single thread.

-Bertrand

[1] https://sling.apache.org/documentation/bundles/repository-initialization.html
[2] https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java
[3] https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511
[4] https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
You are right Robert, the error is seen at shutdown, even if it's similar.
I continued doing a bit of debugging to understand how sling is loading up
the different resources and had a quesiton: do configurations from the
repoinit files get installed in a specific order with relation to the
artifacts? do they get installed in different threads?
This might idicate if we are seeing a slow thread in there.
Thanks again for all the help.
Carlos


On Thu, Jan 30, 2020 at 4:47 AM Robert Munteanu <ro...@apache.org> wrote:

> Hi Carlos,
>
> Yes, this may be a timing issue.
>
> I could not follow the link you sent me for some reason. I think the
> build log is the one from [1]. If that is the case, the error is
> visible at shutdown, and probably does not have the same root cause.
>
> I'd still like to get some steps to reproduce - even if it's a change
> of 1 in 3, even if it depends on containers.
>
> Thanks,
> Robert
>
>
> [1]:
> https://builds.apache.org/job/Sling/job/sling-org-apache-sling-starter/job/master/104/console
>
> On Wed, 2020-01-29 at 21:27 -0500, Carlos Munoz wrote:
> > Robert, I checked the latest (master) pipeline build logs for the
> > starter
> > project:
> >
> >
> https://builds.apache.org/blue/organizations/jenkins/Sling%2Fsling-org-apache-sling-starter/detail/master/104/pipeline/24
> >
> > and found that there is a very similar error being reported
> > (different
> > principal and bundle), but same type of exception nonetheless.
> >
> > Carlos
> >
> >
> > On Wed, Jan 29, 2020 at 7:47 PM Carlos Munoz <ca...@redhat.com>
> > wrote:
> >
> > > Robert, I wonder if this is a timing issue. I’m not sure I
> > > understand how
> > > Sling is loading bundles and configurations, but is it possible
> > > that it
> > > could load up a bundle which needs a specific configuration before
> > > said
> > > configuration has finished loading?
> > >
> > > I mention this because we are seeing the error now on a
> > > containerized
> > > environment where resources may be more virtualized than in a local
> > > environment, where the application seems to run without any issues.
> > >
> > > Regards,
> > >
> > > Carlos
> > >
> > > On Tue, Jan 28, 2020 at 10:11 PM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > >
> > > > Hi Robert, I'm picking up this thread again since we briefly
> > > > talked about
> > > > this problem; allow me to recap:
> > > > We are attempting to migrate bundle versions for a Sling
> > > > application from
> > > > their Sling 11 versions to the latest stable versions. The
> > > > application is
> > > > running against an already populated mongo database and we are
> > > > seeing the
> > > > following exception when deploying.
> > > >
> > > > 29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup
> > > > Thread
> > > > #4] ERROR: Bundle '160' EventDispatcher: Error during dispatch.
> > > > (org.apache.sling.api.SlingException: Can't create the JCR event
> > > > listener.)
> > > > org.apache.sling.api.SlingException: Can't create the JCR event
> > > > listener.
> > > > at
> > > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > > ider.registerListeners(JcrResourceProvider.java:227)
> > > >
> > > > at
> > > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > > ider.start(JcrResourceProvider.java:182)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Handler.activate(ResourceProviderHandler.java:74)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker.activate(ResourceProviderTracker.java:360)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker.register(ResourceProviderTracker.java:192)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker.access$200(ResourceProviderTracker.java:59)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker$1.addingService(ResourceProviderTracker.java:130)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker$1.addingService(ResourceProviderTracker.java:106)
> > > >
> > > > at
> > > > org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(Ser
> > > > viceTracker.java:943)
> > > >
> > > > at
> > > > org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(Ser
> > > > viceTracker.java:871)
> > > >
> > > > at
> > > > org.osgi.util.tracker.AbstractTracked.trackAdding(AbstractTracked
> > > > .java:256)
> > > > at
> > > > org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:
> > > > 229)
> > > > at
> > > > org.osgi.util.tracker.ServiceTracker$Tracked.serviceChanged(Servi
> > > > ceTracker.java:903)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.invokeServiceListenerC
> > > > allback(EventDispatcher.java:990)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireEventImmediately(E
> > > > ventDispatcher.java:838)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireServiceEvent(Event
> > > > Dispatcher.java:545)
> > > >
> > > > at
> > > > org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833
> > > > )
> > > > at
> > > > org.apache.felix.framework.Felix.registerService(Felix.java:3804)
> > > > at
> > > > org.apache.felix.framework.BundleContextImpl.registerService(Bund
> > > > leContextImpl.java:328)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager$3.regi
> > > > ster(AbstractComponentManager.java:906)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager$3.regi
> > > > ster(AbstractComponentManager.java:892)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.RegistrationManager.changeRegis
> > > > tration(RegistrationManager.java:128)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager.regist
> > > > erService(AbstractComponentManager.java:959)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager.activa
> > > > teInternal(AbstractComponentManager.java:732)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.DependencyManager$SingleStaticC
> > > > ustomizer.addedService(DependencyManager.java:1045)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.DependencyManager$SingleStaticC
> > > > ustomizer.addedService(DependencyManager.java:999)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customiz
> > > > erAdded(ServiceTracker.java:1216)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customiz
> > > > erAdded(ServiceTracker.java:1137)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.
> > > > trackAdding(ServiceTracker.java:944)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.
> > > > track(ServiceTracker.java:880)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceC
> > > > hanged(ServiceTracker.java:1168)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.s
> > > > erviceChanged(BundleComponentActivator.java:125)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.invokeServiceListenerC
> > > > allback(EventDispatcher.java:990)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireEventImmediately(E
> > > > ventDispatcher.java:838)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireServiceEvent(Event
> > > > Dispatcher.java:545)
> > > >
> > > > at
> > > > org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833
> > > > )
> > > > at
> > > > org.apache.felix.framework.Felix.registerService(Felix.java:3804)
> > > > at
> > > > org.apache.felix.framework.BundleContextImpl.registerService(Bund
> > > > leContextImpl.java:328)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.register
> > > > Service(AbstractSlingRepositoryManager.java:218)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.initiali
> > > > zeAndRegisterRepositoryService(AbstractSlingRepositoryManager.jav
> > > > a:541)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.access$3
> > > > 00(AbstractSlingRepositoryManager.java:92)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager$4.run(Ab
> > > > stractSlingRepositoryManager.java:496)
> > > >
> > > > Caused by: javax.jcr.LoginException: Can neither derive user name
> > > > nor
> > > > principal names for bundle org.apache.sling.jcr.resource [154]
> > > > and sub
> > > > service observation
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(A
> > > > bstractSlingRepository2.java:387)
> > > >
> > > > at
> > > > org.apache.sling.jcr.resource.internal.JcrListenerBaseConfig.<ini
> > > > t>(JcrListenerBaseConfig.java:62)
> > > >
> > > > at
> > > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > > ider.registerListeners(JcrResourceProvider.java:218)
> > > >
> > > > ... 41 more
> > > >
> > > >
> > > > The application deploys fine when not running against mongo, or
> > > > when
> > > > running against a clean mongo instance.
> > > >
> > > > The changes are located here for reference:
> > > >
> > > >
> https://github.com/redhataccess/pantheon/pull/219/files#diff-e93a9e4b7b62ab20d546f78f9ac775c8L33
> > > >
> > > > Any ideas on what could be going wrong?
> > > >
> > > > Regards,
> > > >
> > > > Carlos
> > > >
> > > >
> > > >
> > > > On Mon, Jan 27, 2020 at 4:57 AM Robert Munteanu <
> > > > rombert@apache.org>
> > > > wrote:
> > > >
> > > > > Happy to hear that you got it sorted out! Feel free to come
> > > > > back with
> > > > > more questions if you have any.
> > > > >
> > > > > Thanks,
> > > > > Robert
> > > > >
> > > > > On Fri, 2020-01-24 at 10:58 -0500, Carlos Munoz wrote:
> > > > > > Thanks Robert. I think we actually found out what was going
> > > > > > on: it
> > > > > > seems we
> > > > > > have a poorly defined index which was being deployed as part
> > > > > > of our
> > > > > > bundle
> > > > > > and which was interfering with some of the other indexes. As
> > > > > > soon as
> > > > > > we
> > > > > > removed it everything started working once again. We are
> > > > > > working on a
> > > > > > better index for the query right now.
> > > > > >
> > > > > > Really appreciate your willingness to help here... ++
> > > > > >
> > > > > > On Fri, Jan 24, 2020 at 5:03 AM Robert Munteanu <
> > > > > > rombert@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I tried building the app from source code but did not
> > > > > > > reproduce the
> > > > > > > problem. I guess this matches your experience - this
> > > > > > > happens only
> > > > > > > during an 'upgrade'.
> > > > > > >
> > > > > > > Can you please give me a set of steps to reproduce? Ideally
> > > > > > > without
> > > > > > > MongoDB, but if that's required leave it in :-)
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Robert
> > > > > > >
> > > > > > > On Wed, 2020-01-22 at 22:08 -0500, Carlos Munoz wrote:
> > > > > > > > I double checked and we do have the mapping. We copied
> > > > > > > > all the
> > > > > > > > provisioning
> > > > > > > > files from the commit you recommended earlier [1] and
> > > > > > > > deployed
> > > > > > > > like
> > > > > > > > that.
> > > > > > > >
> > > > > > > > In fact, you can see our provisioning files here: [2] We
> > > > > > > > are only
> > > > > > > > adding a
> > > > > > > > single file with our own bundle and configurations.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > >
> https://github.com/apache/sling-org-apache-sling-starter/commit/c4f6e3b
> > > > > > > > [2]
> > > > > > > >
> > > > >
> https://github.com/redhataccess/pantheon/tree/upgrade-sling-bundles/pantheon-slingstart/src/main/provisioning
> > > > > > > > On Wed, Jan 22, 2020 at 4:54 PM Robert Munteanu <
> > > > > > > > rombert@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Wed, 2020-01-22 at 16:16 -0500, Carlos Munoz wrote:
> > > > > > > > > > Thanks for the tip Daniel!
> > > > > > > > > >
> > > > > > > > > > Robert - we were able to successfully package the
> > > > > > > > > > sling
> > > > > > > > > > starter
> > > > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > latest definitions as you pointed, but when deploying
> > > > > > > > > > on top
> > > > > > > > > > of
> > > > > > > > > > an
> > > > > > > > > > existing
> > > > > > > > > > database we started getting a JCR error:
> > > > > > > > > >
> > > > > > > > > > javax.jcr.LoginException: Can neither derive user
> > > > > > > > > > name nor
> > > > > > > > > > principal
> > > > > > > > > > names
> > > > > > > > > > for bundle org.apache.sling.jcr.resource [152] and
> > > > > > > > > > sub
> > > > > > > > > > service
> > > > > > > > > > observation
> > > > > > > > > >
> > > > > > > > > > We don't get the same error when deploying on a fresh
> > > > > > > > > > database.
> > > > > > > > >
> > > > > > > > > It seems that you have some missing service user
> > > > > > > > > mappings.
> > > > > > > > > Those
> > > > > > > > > might
> > > > > > > > > be required by newer versions of the bundles that you
> > > > > > > > > just
> > > > > > > > > consumed. In
> > > > > > > > > the Sling Starter the current mapping is defined at
> > > > > > > > > [1].
> > > > > > > > >
> > > > > > > > > Does adding that as a configuration to your application
> > > > > > > > > help?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Robert
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]:
> > > > > > > > >
> > > > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/7eac121fc3f00c95ef5b8ac38133f6796a4a6c08/src/main/provisioning/sling.txt#L199-L202
> > > > >
> > > > > --
> > >
> > > Carlos A. Muñoz
> > >
> > > Manager, Software Engineering - Customer Platform
> > >
> > > Red Hat <https://www.redhat.com>
> > > <https://red.ht/sig>
> > >
>
>

Re: Error migrating to latest version of the bundles - Can neither derive user name nor principal names (was: Slow queries and unexpected results)

Posted by Carlos Munoz <ca...@redhat.com>.
Sure thing.

Steps to reproduce are fairly simple, we actually saw the same result when
building a vanilla sling starter from master [1]. But if you want to be
closer to our specific problem you can build from our application's starter
module [2].

We containerize the application using the Dockerfile in the same repo [3],
and we run in a Kubernetes orchestrated server against a Mongo DB cluster
with 3 nodes (note that there is some latency due to the database).

We run the application from the resulting container image and we see the
error as sling is coming up.

Let me know if this is enough to go by.

As a side question, I was wondering if the order of the provisioning files
makes any difference to the sequence in which bundles and configurations
are installed. I was thinking if this is the case we might be able to
reorganize those files to ensure all configurations are loaded before the
bundles using them.

Regards,

Carlos

[1] https://github.com/apache/sling-org-apache-sling-starter
[2] https://github.com/redhataccess/pantheon/pull/221
[3]
https://github.com/redhataccess/pantheon/blob/master/container/Dockerfile

On Thu, Jan 30, 2020 at 4:47 AM Robert Munteanu <ro...@apache.org> wrote:

> Hi Carlos,
>
> Yes, this may be a timing issue.
>
> I could not follow the link you sent me for some reason. I think the
> build log is the one from [1]. If that is the case, the error is
> visible at shutdown, and probably does not have the same root cause.
>
> I'd still like to get some steps to reproduce - even if it's a change
> of 1 in 3, even if it depends on containers.
>
> Thanks,
> Robert
>
>
> [1]:
> https://builds.apache.org/job/Sling/job/sling-org-apache-sling-starter/job/master/104/console
>
> On Wed, 2020-01-29 at 21:27 -0500, Carlos Munoz wrote:
> > Robert, I checked the latest (master) pipeline build logs for the
> > starter
> > project:
> >
> >
> https://builds.apache.org/blue/organizations/jenkins/Sling%2Fsling-org-apache-sling-starter/detail/master/104/pipeline/24
> >
> > and found that there is a very similar error being reported
> > (different
> > principal and bundle), but same type of exception nonetheless.
> >
> > Carlos
> >
> >
> > On Wed, Jan 29, 2020 at 7:47 PM Carlos Munoz <ca...@redhat.com>
> > wrote:
> >
> > > Robert, I wonder if this is a timing issue. I’m not sure I
> > > understand how
> > > Sling is loading bundles and configurations, but is it possible
> > > that it
> > > could load up a bundle which needs a specific configuration before
> > > said
> > > configuration has finished loading?
> > >
> > > I mention this because we are seeing the error now on a
> > > containerized
> > > environment where resources may be more virtualized than in a local
> > > environment, where the application seems to run without any issues.
> > >
> > > Regards,
> > >
> > > Carlos
> > >
> > > On Tue, Jan 28, 2020 at 10:11 PM Carlos Munoz <ca...@redhat.com>
> > > wrote:
> > >
> > > > Hi Robert, I'm picking up this thread again since we briefly
> > > > talked about
> > > > this problem; allow me to recap:
> > > > We are attempting to migrate bundle versions for a Sling
> > > > application from
> > > > their Sling 11 versions to the latest stable versions. The
> > > > application is
> > > > running against an already populated mongo database and we are
> > > > seeing the
> > > > following exception when deploying.
> > > >
> > > > 29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup
> > > > Thread
> > > > #4] ERROR: Bundle '160' EventDispatcher: Error during dispatch.
> > > > (org.apache.sling.api.SlingException: Can't create the JCR event
> > > > listener.)
> > > > org.apache.sling.api.SlingException: Can't create the JCR event
> > > > listener.
> > > > at
> > > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > > ider.registerListeners(JcrResourceProvider.java:227)
> > > >
> > > > at
> > > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > > ider.start(JcrResourceProvider.java:182)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Handler.activate(ResourceProviderHandler.java:74)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker.activate(ResourceProviderTracker.java:360)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker.register(ResourceProviderTracker.java:192)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker.access$200(ResourceProviderTracker.java:59)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker$1.addingService(ResourceProviderTracker.java:130)
> > > >
> > > > at
> > > > org.apache.sling.resourceresolver.impl.providers.ResourceProvider
> > > > Tracker$1.addingService(ResourceProviderTracker.java:106)
> > > >
> > > > at
> > > > org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(Ser
> > > > viceTracker.java:943)
> > > >
> > > > at
> > > > org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(Ser
> > > > viceTracker.java:871)
> > > >
> > > > at
> > > > org.osgi.util.tracker.AbstractTracked.trackAdding(AbstractTracked
> > > > .java:256)
> > > > at
> > > > org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:
> > > > 229)
> > > > at
> > > > org.osgi.util.tracker.ServiceTracker$Tracked.serviceChanged(Servi
> > > > ceTracker.java:903)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.invokeServiceListenerC
> > > > allback(EventDispatcher.java:990)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireEventImmediately(E
> > > > ventDispatcher.java:838)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireServiceEvent(Event
> > > > Dispatcher.java:545)
> > > >
> > > > at
> > > > org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833
> > > > )
> > > > at
> > > > org.apache.felix.framework.Felix.registerService(Felix.java:3804)
> > > > at
> > > > org.apache.felix.framework.BundleContextImpl.registerService(Bund
> > > > leContextImpl.java:328)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager$3.regi
> > > > ster(AbstractComponentManager.java:906)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager$3.regi
> > > > ster(AbstractComponentManager.java:892)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.RegistrationManager.changeRegis
> > > > tration(RegistrationManager.java:128)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager.regist
> > > > erService(AbstractComponentManager.java:959)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.AbstractComponentManager.activa
> > > > teInternal(AbstractComponentManager.java:732)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.DependencyManager$SingleStaticC
> > > > ustomizer.addedService(DependencyManager.java:1045)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.DependencyManager$SingleStaticC
> > > > ustomizer.addedService(DependencyManager.java:999)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customiz
> > > > erAdded(ServiceTracker.java:1216)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customiz
> > > > erAdded(ServiceTracker.java:1137)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.
> > > > trackAdding(ServiceTracker.java:944)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.
> > > > track(ServiceTracker.java:880)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceC
> > > > hanged(ServiceTracker.java:1168)
> > > >
> > > > at
> > > > org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.s
> > > > erviceChanged(BundleComponentActivator.java:125)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.invokeServiceListenerC
> > > > allback(EventDispatcher.java:990)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireEventImmediately(E
> > > > ventDispatcher.java:838)
> > > >
> > > > at
> > > > org.apache.felix.framework.EventDispatcher.fireServiceEvent(Event
> > > > Dispatcher.java:545)
> > > >
> > > > at
> > > > org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4833
> > > > )
> > > > at
> > > > org.apache.felix.framework.Felix.registerService(Felix.java:3804)
> > > > at
> > > > org.apache.felix.framework.BundleContextImpl.registerService(Bund
> > > > leContextImpl.java:328)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.register
> > > > Service(AbstractSlingRepositoryManager.java:218)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.initiali
> > > > zeAndRegisterRepositoryService(AbstractSlingRepositoryManager.jav
> > > > a:541)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.access$3
> > > > 00(AbstractSlingRepositoryManager.java:92)
> > > >
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepositoryManager$4.run(Ab
> > > > stractSlingRepositoryManager.java:496)
> > > >
> > > > Caused by: javax.jcr.LoginException: Can neither derive user name
> > > > nor
> > > > principal names for bundle org.apache.sling.jcr.resource [154]
> > > > and sub
> > > > service observation
> > > > at
> > > > org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(A
> > > > bstractSlingRepository2.java:387)
> > > >
> > > > at
> > > > org.apache.sling.jcr.resource.internal.JcrListenerBaseConfig.<ini
> > > > t>(JcrListenerBaseConfig.java:62)
> > > >
> > > > at
> > > > org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProv
> > > > ider.registerListeners(JcrResourceProvider.java:218)
> > > >
> > > > ... 41 more
> > > >
> > > >
> > > > The application deploys fine when not running against mongo, or
> > > > when
> > > > running against a clean mongo instance.
> > > >
> > > > The changes are located here for reference:
> > > >
> > > >
> https://github.com/redhataccess/pantheon/pull/219/files#diff-e93a9e4b7b62ab20d546f78f9ac775c8L33
> > > >
> > > > Any ideas on what could be going wrong?
> > > >
> > > > Regards,
> > > >
> > > > Carlos
> > > >
> > > >
> > > >
> > > > On Mon, Jan 27, 2020 at 4:57 AM Robert Munteanu <
> > > > rombert@apache.org>
> > > > wrote:
> > > >
> > > > > Happy to hear that you got it sorted out! Feel free to come
> > > > > back with
> > > > > more questions if you have any.
> > > > >
> > > > > Thanks,
> > > > > Robert
> > > > >
> > > > > On Fri, 2020-01-24 at 10:58 -0500, Carlos Munoz wrote:
> > > > > > Thanks Robert. I think we actually found out what was going
> > > > > > on: it
> > > > > > seems we
> > > > > > have a poorly defined index which was being deployed as part
> > > > > > of our
> > > > > > bundle
> > > > > > and which was interfering with some of the other indexes. As
> > > > > > soon as
> > > > > > we
> > > > > > removed it everything started working once again. We are
> > > > > > working on a
> > > > > > better index for the query right now.
> > > > > >
> > > > > > Really appreciate your willingness to help here... ++
> > > > > >
> > > > > > On Fri, Jan 24, 2020 at 5:03 AM Robert Munteanu <
> > > > > > rombert@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I tried building the app from source code but did not
> > > > > > > reproduce the
> > > > > > > problem. I guess this matches your experience - this
> > > > > > > happens only
> > > > > > > during an 'upgrade'.
> > > > > > >
> > > > > > > Can you please give me a set of steps to reproduce? Ideally
> > > > > > > without
> > > > > > > MongoDB, but if that's required leave it in :-)
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Robert
> > > > > > >
> > > > > > > On Wed, 2020-01-22 at 22:08 -0500, Carlos Munoz wrote:
> > > > > > > > I double checked and we do have the mapping. We copied
> > > > > > > > all the
> > > > > > > > provisioning
> > > > > > > > files from the commit you recommended earlier [1] and
> > > > > > > > deployed
> > > > > > > > like
> > > > > > > > that.
> > > > > > > >
> > > > > > > > In fact, you can see our provisioning files here: [2] We
> > > > > > > > are only
> > > > > > > > adding a
> > > > > > > > single file with our own bundle and configurations.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > >
> https://github.com/apache/sling-org-apache-sling-starter/commit/c4f6e3b
> > > > > > > > [2]
> > > > > > > >
> > > > >
> https://github.com/redhataccess/pantheon/tree/upgrade-sling-bundles/pantheon-slingstart/src/main/provisioning
> > > > > > > > On Wed, Jan 22, 2020 at 4:54 PM Robert Munteanu <
> > > > > > > > rombert@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Wed, 2020-01-22 at 16:16 -0500, Carlos Munoz wrote:
> > > > > > > > > > Thanks for the tip Daniel!
> > > > > > > > > >
> > > > > > > > > > Robert - we were able to successfully package the
> > > > > > > > > > sling
> > > > > > > > > > starter
> > > > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > latest definitions as you pointed, but when deploying
> > > > > > > > > > on top
> > > > > > > > > > of
> > > > > > > > > > an
> > > > > > > > > > existing
> > > > > > > > > > database we started getting a JCR error:
> > > > > > > > > >
> > > > > > > > > > javax.jcr.LoginException: Can neither derive user
> > > > > > > > > > name nor
> > > > > > > > > > principal
> > > > > > > > > > names
> > > > > > > > > > for bundle org.apache.sling.jcr.resource [152] and
> > > > > > > > > > sub
> > > > > > > > > > service
> > > > > > > > > > observation
> > > > > > > > > >
> > > > > > > > > > We don't get the same error when deploying on a fresh
> > > > > > > > > > database.
> > > > > > > > >
> > > > > > > > > It seems that you have some missing service user
> > > > > > > > > mappings.
> > > > > > > > > Those
> > > > > > > > > might
> > > > > > > > > be required by newer versions of the bundles that you
> > > > > > > > > just
> > > > > > > > > consumed. In
> > > > > > > > > the Sling Starter the current mapping is defined at
> > > > > > > > > [1].
> > > > > > > > >
> > > > > > > > > Does adding that as a configuration to your application
> > > > > > > > > help?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Robert
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]:
> > > > > > > > >
> > > > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/7eac121fc3f00c95ef5b8ac38133f6796a4a6c08/src/main/provisioning/sling.txt#L199-L202
> > > > >
> > > > > --
> > >
> > > Carlos A. Muñoz
> > >
> > > Manager, Software Engineering - Customer Platform
> > >
> > > Red Hat <https://www.redhat.com>
> > > <https://red.ht/sig>
> > >
>
>