You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Gus Heck <gu...@gmail.com> on 2021/08/16 15:37:37 UTC

Solr and Jetty and Servlets

*TLDR:* I've got a working branch
<https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer &
our startup process is extracted from SolrDispatch Filter. Do other folks
think that is interesting enough that I should make a JIRA and/or PR with
intention to make this change?

*Details:*

Jetty is a servlet container, yet we more or less ignore it by stuffing
everything into a single filter (almost, admin UI is served separately).
I'm sure there are lots of historical reasons for this, probably including
servlet containers and their specs were much less mature when solr was
first started. Maybe also the early authors were more focused on making
search work than leveraging what the container could do for them (but I
wasn't there for that so that's just a guess).

The result is that we have a couple of very large classes that are touched
by almost every request, and they have a lot of conditional logic trying to
decide what the user is asking. Normally this sort of dispatch is done by
the container based on the request URL to direct it to the appropriate
servlet. Solr does a LOT of different things so this code is extremely
tricky and complex to understand. Specifically, I'm speaking of
SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
two classes probably makes them harder to understand.

It seems to me (and this mail is asking if you agree) that these classes
are long overdue for some subdivision. The most obvious thing to pull out
is all the admin calls. Admin code paths really have little or nothing to
do with query or update code paths since there are no documents to route or
sub-requests to some subset of nodes.

The primary obstacle to any such separation and simplification is that most
requests have some interaction with CoreContainer, and the things it holds,
and this is initialized and held by a field in SolrDispatchFilter. After
spending a significant chunk of time reading this code in the prior weeks
and a timely and motivating conversation with Eric Pugh, I dumped a chunk
of my weekend into an experiment to see if I could pull CoreContainer out
of the dispatch filter, and leverage the facilities of our servlet
container.

That wasn't too terribly hard, but keeping JettySolrRunner happy was very
confusing, and worrisome since I've realized it's not respecting our
web.xml at all, and any configuration in web.xml needs to be duplicated for
our tests in JettySolrRunner (tangent alert)

The result is that CoreContainer is now held by a class called CoreService
(please help me name things if you don't like my names :) ). CoreService is
a ServletContextListener, appropriately configured in web.xml, and has a
static method that can be used to get a reference to the CoreContainer
corresponding to the ServletContext in which code wanting a core container
is running (this supports having multiple JettySolrRunners in tests, though
probably never has more than one CoreContainer in the running application)

I achieved this in 4 stages shown here:
https://github.com/gus-asf/solr/tree/servlet-solr

Ignore the AdminServlet class, it's a placeholder, and can be subtracted
without harm.

Since the current state of the code in that branch is apparently
test-stable (4 runs of check in a row passing, none slower than any run of
3 runs of main, both as low as 10.5 min if I don't continue working on the
machine)...

Do we want to push this refactor in now to avoid making a huge ball of
changes that gets harder and harder to merge? The next push point would
probably be when AdminServlet was functional (if/when that happens) (and we
could not push that class for now).

If you read this far, thanks :) I wasn't sure how feasible this would be so
I felt the need to prove it to my self in code before wasting your time,
but please don't hesitate to point to costs I might be missing or anything
that looks like a mistake, or say why this was a total waste of time :)

Thoughts?

-Gus
-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
Gonna give folks another week or so to review it, then I'll proceed with
addressing review feedback whatever cleanup seems appropriate...

On Mon, Aug 23, 2021 at 8:56 AM Jason Gerlowski <ge...@gmail.com>
wrote:

> I don't know enough about Jetty or startup times or webapp packaging
> options to respond to much of the later discussion on this thread.
>
> But I do know how hard HttpSolrCall and SolrDispatchFilter are to
> understand and how painful they've been to maintain.  They've slowed
> my own development down in the past and have been a source of at least
> a few bugs (I'm remembering some auth-framework bugs in particular,
> though I'm sure there have been others.)  Anyway, speaking from that
> limited viewpoint, I'd love to see the work in Gus' PR land.  Thanks
> for the weekend-experiment man!
>
> Jason
>
> On Thu, Aug 19, 2021 at 12:57 PM Mark Miller <ma...@gmail.com>
> wrote:
> >
> > You could say that about a 1000 of the things I could bring it up. I’m
> sure I’ve said it.
> >
> > So I what I did was eliminate the big discussions comment. And ripped
> through thousands. So IMO, there you go. A nice chilling affect that has
> little to do with nuts and bolts software engineering seizing up the
> majority of developers instincts.
> >
> > That and the fact that it’s hard to see those little bit of work paths.
> They are covered and obscured by weeds and thorns. But they are there.
> >
> > My schedule is full of stuff that has nothing to do with these things.
> I’m not reopening any discussions myself. But I will match good work.
> >
> > If someone thinks QuickStart or something along the lines of Uwe’s
> design is worth tackling - that startup performance matters - I’ll add an
> option to eat the next bottleneck - dynamically and concurrently loading
> the 10-15k classes. If someone makes a half way sensible rest api on the
> server - I’ll work on a much stronger http2 api.
> >
> > The technical work is not hard. The other shit is maybe not that hard. I
> never found it hard. But I gave it up. And before I did, I told a couple,
> look at ethereum. Look at the love meetings they do. Look at the email
> interactions. Look at the conference. Look at the abstract common goals and
> agreement. Look at the in the ground disfunction and disagreement. And
> maybe try to do something.
> >
> > MRM
> >
> > On Thu, Aug 19, 2021 at 8:06 AM Alexandre Rafalovitch <
> arafalov@gmail.com> wrote:
> >>
> >> > But a little more effort on those json files that make up V2
> >> I looked into that and during the in-depth testing realised that V2
> >> did not support default values (it writes out current values only),
> >> nor did it have a proper round-trip story, especially for the nested
> >> configurations. So, it was - architecturally - not at parity with the
> >> way Solr did configuration.
> >>
> >> I documented my findings in a bunch of JIRAs with the umbrella in
> >> SOLR-14795 but stopped after that, as I could not figure out the way
> >> to do that "little more effort" without reopening big discussions.
> >>
> >> Not saying it is not possible (or to derail the parent conversation),
> >> just my two cents on attempting that kind of "a small amount of
> >> effort".
> >>
> >> Regards,
> >>    Alex.
> >>
> >> On Thu, 19 Aug 2021 at 02:05, Mark Miller <ma...@gmail.com>
> wrote:
> >> >
> >> > The gap is not ideal and I’m not advocating for it, but it does help
> keep you honest. There is gap between a mini cluster and a packed dist
> regardless and it’s really something that should be covered by testing even
> with more of a lineup.
> >> >
> >> > I’m very into dependency injection in Solr. I was adding juice at the
> end of branch 2.5. The benefit to testing is well known, but wasn’t even
> looking at that.
> >> >
> >> > Our class dependencies are a major hindrance. From everywhere, you
> need access to the CoreContainer. And ZkController. And then make chained
> calls all over the place to reach back and forth into each other.
> >> >
> >> > Need a zksolrclient? Start chaining method calls until you find it.
> ZkstateReader? Pretty much anything. You find a friendly root and start
> reaching back and forth. It’s a very verbose and complicated framework. And
> with issues depending on the class and implementation.  Because threads are
> started in constructors and everything is so interdependent, you actually
> race spinning the wild up.
> >> >
> >> > Getting the construction, and lifecycle and access to all the core
> widely used classes is delicate work. And is essentially given to everyone,
> everywhere.
> >> >
> >> > So I built everything at the top, in a guice configuration. Lifecycle
> and construction and dependencies mostly collocated in the code in one
> spot. Classes that need access get those objects injected. No looking for,
> chaining for, constructing a second or third instance for …
> >> >
> >> > And then what I was starting on was, because we have all these
> complex objects and difficult to understand lifecycle and access, and in
> many many places - mistakes are made (Someone closes something they
> shouldn’t. Makes a call they are not supposed to) And so I started working
> on having most of the key, long lived objects managed by a small core layer
> of code. And the objects that got injected where getting specialized - only
> the core lifecycle code can close a zksolrclient. The method doesn’t exist
> or throws and exception for the request code. Requests can access the
> resources - easily discoverable, and without the ability to mess up with
> the unnecessary power they are given.
> >> >
> >> > So just like api V2, you can find a hint of this in the dual api for
> zk access. And like V2, half baked, undocumented. Team tackled by like one,
> for a while, to some limited extent. And so again you have an independent
> idea built on a gold nugget that ends up increasing complexity and
> variation and surface area and ability to understand. The core of the code
> always large enough to barrel along in the same direction while some
> independent, time limited efforts, start pulling the mass wider in
> objectively reasonable goals. But it’s just a wider mass barreling forward.
> >> >
> >> > Essentially to say, yes. Dependency injection.  You can shrink the
> world. Give the average developer half a chance.
> >> >
> >> > Request dispatching and the rest api should be the same deal. Out of
> the way. Done with a focus to shrink the world and remove the power and let
> search developers focus on search. I look at the V2 api and think, man,
> it’s like someone saw gold and then went after a nickel. We shouldn’t be in
> the rest api and dispatching business.
> >> >
> >> > Let’s say you implement an OpenApi or SwaggerApi. You get tools. Web,
> ide, cmd line.   You get code generation. Server and client. Editors. You
> get automated documentation creation. You get a standard and compatibility
> that’s handled by a huge community dedicated to it. When devs have to work
> on it, it’s simple, it’s documented and tutorialed and talked about -
> outside of the project and the one or two hobbiests that have taken a
> passing interest in a rest apia and dispatching.
> >> >
> >> > And V2 seems to look at that and take a bite or two. But leave almost
> eveything on the table. A lot of code. Very non standard custom,
> undocumented code. Non of the tool support. Pretty heavy work to even
> unwind it with few  pointers to help.  Still no standard. Self documenting
> at a very low level. And with very little of the benefit. When you follow
> these specs, not only is most of the work down for you beyond
> specification, but your ide auto completes your urls. Tools auto mock
> testing. A whole bunch of crap pushed away.
> >> >
> >> > The stuff that Solr is actually good at is the core of the actual
> search functionality. But the amount of out of scope and poor code
> surrounding that is what drags in the whole system. And most of it can be
> kicked right out of the project and into code and specs solved and
> maintained by people that actually do that work for a living.
> >> >
> >> > Here and there we have some special needs. No doubt. But a little
> more effort on those json files that make up V2? And at the push of a
> button you have the basic working code for a full client. For a jetty
> server implementation. And very pretty, full, exampled, generated
> documentation. Just a click and a small amount of effort. Even if you say,
> while the search api, and some of the performance and special case stuff…
> okay, okay, I think there’s likely a half way point, but let’s just say you
> take the whole admin api off the table. Beautiful. At least the rest is
> search related.
> >> >
> >> >
> >> > MRM
> >> >
> >> > On Wed, Aug 18, 2021 at 11:51 AM Gus Heck <gu...@gmail.com> wrote:
> >> >>
> >> >> The thing that concerned me most about JettySolrRunner is that it
> differed from what we use in production essentially leaving a gap of
> untested functionality, so if we move one or the other to match I'm pretty
> much happy. The thing it (and ServeletContextHandler) don't do is start
> anything else without instructions, which is the good and bad. I've not had
> any microservice oriented clients so startup time hasn't been a thing I
> thought much about but all the points above make sense in that light.
> >> >>
> >> >> Since SIP-6 doesn't contemplate anything like replacing
> jetty/servlets entirely I don't think anything I did here stands in its
> way, just a slightly longer list of things to instantiate (conveniently
> documented in web.xml) but code review would be a happy thing in case I
> missed something on that front.
> >> >>
> >> >> Even what Uwe describes seems compatible, and however we go it would
> be MUCH better to have the ability to inject dependencies into the things
> we start. Because of the unit tests we can't ever use any singleton that
> has state, so that's out and threadlocals are too broad, so without control
> to inject I was forced to set up a delicate dance with a weak map keyed on
> the servlet contexts which is not my favorite thing, but it worked.
> >> >>
> >> >> -Gus
> >> >>
> >> >> On Wed, Aug 18, 2021 at 10:03 AM Jan Høydahl <ja...@cominvent.com>
> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> Clearly a refactor is needed here, thanks Gus for diving in deep.
> >> >>>
> >> >>> Like Uwe I don't want us to be more dependent upon web.xml or
> Jetty's startup logic. That's why I filed SIP-6 (
> https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process)
> to try to describe the vision that Uwe just described. If such a bootstrap
> redesign makes the dispatch cleanup easier or harder I don't know. Perhaps
> one should be done before the other? Or shuold be viewed as part of the
> same rewrite?
> >> >>>
> >> >>> Jan
> >> >>>
> >> >>> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> Thanks Mark, I fully agree with you on this: Booting up jetty with
> start.jar and using a web application is dramatically slower.
> >> >>>
> >> >>> Nevertheless, I think that we should go away with the web
> application alltogether. We can still stay with the servlet API, but just –
> as in Gus patch – to split responsibilities and go away with a singleton
> ServletFilter, although from my own experience, I figured out that
> Servletapis’s URI handling is far away from ideal. In most cases at end I
> always landed in a separate “master” servlet (hooked to “/*”) that only
> parses the PathInfo with regular expressions (one thing that does not work
> with servlet api) and then delegates using RequestDispatcher to several
> more specific servlets or returns 404 not found. This setup would also help
> to split the huge DispatchFilter mess, but still the URI parsing logic
> would be centralized. But this servlet would only take care of the
> dispatching, so DispatchServlet would be fine. But it should only dispatch,
> nothing else.
> >> >>>
> >> >>> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to
> the main distribution and remove start.jar completely. Also remove
> jetty-webapp.jar and related dependencies. In Boostrap just start Jetty
> embedded, add listen ip, port numbers, etc. We can still use the config API
> of Jetty for this, so maybe put this into an XML file (although XML is a
> bad format for this stuff, goes back to web applications). Then Bootsrap
> starts the Corecontainer and the embedded Zookeeper (this is another
> horrible thing in Solr: the embedded zookeper starts somehow as part of the
> web application) and hooks the servlets into a (at first) single root
> ServletContextHandler. This is upside down. So Solr starts with a Bootstrap
> class, starts zookeeper, jetty and hands over to CoreContainer.
> >> >>>
> >> >>> Some other ideas (not sure if they are already implemented in the
> patch): Maybe we should install a separate
> ServletContextHandler/Classloader for each Core. This would allow us to go
> away with our own classloader magic. Bootstrap could also do this: It
> starts the CoreContainer, Zookeeper, Jetty Installs a servlet for the admin
> UI and the Core/Collection management and then hands over to the
> CoreContainer to initialize all Cores. Each CoreContainer gets his own
> ServletContextHandler and can therefore easily be loaded unloaded from
> Jetty.
> >> >>>
> >> >>> I wrote many other apps in my daily live like this, the startup
> time (especially for microservices) is great. A Jetty starting up in
> milliseconds and going to service is unbeatable.
> >> >>>
> >> >>> Uwe
> >> >>>
> >> >>> -----
> >> >>> Uwe Schindler
> >> >>> Achterdiek 19, D-28357 Bremen
> >> >>> https://www.thetaphi.de
> >> >>> eMail: uwe@thetaphi.de
> >> >>>
> >> >>> From: Mark Miller <ma...@gmail.com>
> >> >>> Sent: Wednesday, August 18, 2021 9:08 AM
> >> >>> To: Gus Heck <gu...@gmail.com>
> >> >>> Cc: dev@solr.apache.org
> >> >>> Subject: Re: Solr and Jetty and Servlets
> >> >>>
> >> >>> Agreement or not, I wouldn’t step in front of whatever you did.
> >> >>>
> >> >>> I doubt anyone is going to take on actually pulling out of a web
> app because it’s easier to lean on it for configuration and I doubt start
> up speed is a very high priority for most.
> >> >>>
> >> >>> It had never really occurred to me in the past that JettySolrRunner
> was so much faster. I just know it because of driving the test down to
> ridiculously times, starting up and stopping mini clusters of whatever size
> in literally no time. With no QuickStart setup for embedded Jetty.
> >> >>>
> >> >>> So than starting up standalone Solr, it became apparent how much
> slower it was. So I started poking around. Cloud guys like those at google
> had already been feeling pain - preferring to be able to spin up and down
> relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s
> thing how they were dealing with it. And it’s simple, but it’s still easily
> non comparable at the end of the day. Solr ships with like 10,000 classes
> across a ton of jars, even with QuickStart, you have to follow the spec of
> what to do and what might be done by any given webapp - though as
> JettySolrRunner shows, we don’t actually need any of it. QuickStart is
> trying to find every trick to speed up following the container and webapp
> spec. Injecting the dispatch filter just tosses it all out :)
> >> >>>
> >> >>> Any, may point was not that I disagree and you should make Solr
> inject a dispatch filter - my point is just that it doesn’t really matter
> to dispatch. You can do just as much via Jetty either way.
> >> >>>
> >> >>> And anyone that doesn’t agree about “something should be done” is
> either lying or not looking. So I wouldn’t worry about agreement. The
> problem is that it’s quicksand that people are avoiding, not that anyone
> that looks can’t see it a huge rats nest. Look at the V2 API. Someone
> actually wanted to work towards getting out of that hole. But at the end of
> the day, it’s now part of the hole. Made it deeper. Bermuda Triangle. But
> look at it - there was hope there. Hope that one day, dispatching would be
> better. Rather than twice as complicated. I don’t know that hope lives on.
> But no one gonna stop you on reasonable technical sense with any proper
> license.
> >> >>>
> >> >>> Mark
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> *ServletContextHandler not ServletContext
> >> >>>
> >> >>> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> Ok nope, since that only happens with WebAppContext, not
> ServletContext (the parent class). whew :)
> >> >>>
> >> >>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml
> and thus lack of metadata-complete=true in web.xml means that we wind up
> scanning for annotations?
> >> >>>
> >> >>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> https://issues.apache.org/jira/browse/SOLR-15590
> >> >>>
> >> >>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> Ok well it would be interesting to compare quickstarting vs
> JettySolrRunner, and reading up on quickstart gives ideas about
> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
> tangent. Either way we are still in a container and I think I hear some
> agreement that something should be done about the dispatch here, and both
> of you seem to agree that an actual servlet would make more sense than a
> filter, so I'll make a ticket/pr to make it easier to track & easier to
> read the code.
> >> >>>
> >> >>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
> wrote:
> >> >>>
> >> >>> The downside to respecting web.xml and making JettySolrServer serve
> a webapp is that loading a webapp is very expensive and slow in
> comparison.  JettySolrServer actually starts up extremely quickly. It’s
> almost more appealing to change the Server to use the JettySolrServer
> strategy. It’s so slow to load a webapp because of all the things it needs
> to support and scan jars for - in a kind of JavaEE situation, though not as
> bad. Jetty QuickStart does improve the situation if used at least. But
> there is really no need to eat the whole webapp standard for a non webapp
> app to have Jetty manage more of the dispatching. I always wondering about
> the motivation / upside of hacking in a straight servlet myself. But for a
> non webapp stuck in a servlet container, it’s actually a beautiful move
> that side steps a bunch of slow crap even better than the QuickStart pushes
> ever will, and still allows for matching any support we would want.
> >> >>>
> >> >>> The HttpSolrCallV2 extending HttpSolrCall is horrendous.
> Personally, I made a slim SolrCall base class that each extends. All that
> logic is hairy enough to follow without them interleaving and sharing in a
> kind of wild collaboration. It’s pretty unfortunate they both still exist.
> >> >>>
> >> >>> You are right, leaving that path in that state is a poor idea. If
> there was anything that could be considered the “hot” path, it’s right
> there - feverishly checking and ripping up streams for anyone and anything.
> Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the
> path down? Do a dance? Treat the string like a date and see if that works.
> >> >>>
> >> >>> So yeah, agreed, nobody does or would dispatch this way, you have
> to frog boil into it. It’s slow, almost incomprehensible, and incredibly
> good at holding onto life, even building on it. But being a webapp and
> parsing web.xml has little to do with dispatching and having sensible http
> api that doesn’t work like it’s a perl compiler.
> >> >>>
> >> >>> Anyway, I can’t imagine trying to trace through that code with any
> solid feeling about having a handle on it via inspection. But just like
> Perl, if you debug / trace log the hell out of it, it is all actually
> pretty simple to adjust and reign in.
> >> >>>
> >> >>>
> >> >>> MRM
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> So I'm not interested in *adhering* to anything here, just using
> stuff where it helps... I see tools sitting on the shelf, already bought
> and paid for and carried with us to every job but then they sit in the back
> of the truck mostly unused... and (I think) they look useful.
> >> >>>
> >> >>> This should not be seen as in any way advocating a return to war
> file deployment or "choose your container". All of what I would suggest
> should likely be compatible with a jetty startup controlled by our code (I
> assume we can convince it to read web.xml properly when doing so).
> Certainly point out anything that would interfere with that.
> >> >>>
> >> >>> Here we sit in a container that is a tool box containing:
> >> >>>
> >> >>> A nice mechanism for initializing stuff at the application level,
> (ServletContextListener) and a well defined guarantee that it runs before
> any filter or servlet,
> >> >>> An automatic shutdown mechanism that it will call for us on
> graceful shutdown (ServletContextListner again).
> >> >>> A nice layered filter chain mechanism which could have been used to
> layer in things like tracing and authentication, and close shields etc as
> small succinct filters rather than weaving them into an ever more complex
> filter & call class.
> >> >>> In more recent versions of the spec, for listeners defined in
> web.xml the order is also guaranteed.
> >> >>> Servlet classes that are *already* set up to distinguish http verbs
> automatically when desired
> >> >>>
> >> >>> So why is this better? Because monster classes that do a hundred
> things are really hard to understand and maintain. Small methods, small
> classes whenever possible. I also suspect that there may be some gains in
> performance to be had if we rely more on the container (which will already
> be dispatching based on path) to choose our code paths (at least at a
> coarse level) and then have less custom dispatch logic executed on *every*
> request
> >> >>>
> >> >>> Obviously I'm wrong and if the net result is less performant to any
> significant degree forget it that wouldn't be worth it. (wanted:
> standardized solr benchmarks)
> >> >>>
> >> >>> There WILL be complications with v2 because it is a subclass of
> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
> should be a separate servlet from v1, but we don't want to duplicate code
> either... so work to do there...
> >> >>>
> >> >>> I think an incremental approach is necessary since very few of us
> have the bandwidth to more than that and certainly it becomes difficult to
> find anyone with the time to review large changes. What I have thus far is
> stable with respect to the tests, and simplifies some stuff already which
> is why I chose this point to start the discussion
> >> >>>
> >> >>> But yeah, look at what I did and say what you like and what you
> don't. :).
> >> >>>
> >> >>> -Gus
> >> >>>
> >> >>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org>
> wrote:
> >> >>>
> >> >>> Sounds like an interesting adventure last weekend.
> >> >>>
> >> >>> I'm unclear what the point of going this direction is; my instinct
> is to go the opposite direction.  You seem to suggest there are some
> simplification/organization benefits, which I love, so I'll need to look at
> what you've done to judge that for myself.  Yes Jetty supports the Servlet
> spec but we need not embrace it.  Adhering to that is useful if you have a
> generic web app that can be deployed to a container of the user's
> convenience/choosing.  No doubt this is why Solr started this way, and why
> the apps I built in my early days adhered to that spec.  But since 6.0, we
> view Solr as self-contained and more supportable if the project makes these
> decisions and thus not needlessly constrain itself as well.
> >> >>>
> >> >>> It is super weird to me that SolrDispatchFilter is a Servlet
> *Filter* and not a *Servlet* itself.
> >> >>>
> >> >>> Also, I suspect there may be complications in changes here relating
> to Solr's v1 vs v2 API.  And most definitely also what you discovered --
> JettySolrRunner.
> >> >>>
> >> >>> ~ David Smiley
> >> >>> Apache Lucene/Solr Search Developer
> >> >>> http://www.linkedin.com/in/davidwsmiley
> >> >>>
> >> >>>
> >> >>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com>
> wrote:
> >> >>>
> >> >>> TLDR: I've got a working branch where CoreContainer & our startup
> process is extracted from SolrDispatch Filter. Do other folks think that is
> interesting enough that I should make a JIRA and/or PR with intention to
> make this change?
> >> >>>
> >> >>> Details:
> >> >>>
> >> >>> Jetty is a servlet container, yet we more or less ignore it by
> stuffing everything into a single filter (almost, admin UI is served
> separately). I'm sure there are lots of historical reasons for this,
> probably including servlet containers and their specs were much less mature
> when solr was first started. Maybe also the early authors were more focused
> on making search work than leveraging what the container could do for them
> (but I wasn't there for that so that's just a guess).
> >> >>>
> >> >>> The result is that we have a couple of very large classes that are
> touched by almost every request, and they have a lot of conditional logic
> trying to decide what the user is asking. Normally this sort of dispatch is
> done by the container based on the request URL to direct it to the
> appropriate servlet. Solr does a LOT of different things so this code is
> extremely tricky and complex to understand. Specifically, I'm speaking of
> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
> two classes probably makes them harder to understand.
> >> >>>
> >> >>> It seems to me (and this mail is asking if you agree) that these
> classes are long overdue for some subdivision. The most obvious thing to
> pull out is all the admin calls. Admin code paths really have little or
> nothing to do with query or update code paths since there are no documents
> to route or sub-requests to some subset of nodes.
> >> >>>
> >> >>> The primary obstacle to any such separation and simplification is
> that most requests have some interaction with CoreContainer, and the things
> it holds, and this is initialized and held by a field in
> SolrDispatchFilter. After spending a significant chunk of time reading this
> code in the prior weeks and a timely and motivating conversation with Eric
> Pugh, I dumped a chunk of my weekend into an experiment to see if I could
> pull CoreContainer out of the dispatch filter, and leverage the facilities
> of our servlet container.
> >> >>>
> >> >>> That wasn't too terribly hard, but keeping JettySolrRunner happy
> was very confusing, and worrisome since I've realized it's not respecting
> our web.xml at all, and any configuration in web.xml needs to be duplicated
> for our tests in JettySolrRunner (tangent alert)
> >> >>>
> >> >>> The result is that CoreContainer is now held by a class called
> CoreService (please help me name things if you don't like my names :) ).
> CoreService is a ServletContextListener, appropriately configured in
> web.xml, and has a static method that can be used to get a reference to the
> CoreContainer corresponding to the ServletContext in which code wanting a
> core container is running (this supports having multiple JettySolrRunners
> in tests, though probably never has more than one CoreContainer in the
> running application)
> >> >>>
> >> >>> I achieved this in 4 stages shown here:
> https://github.com/gus-asf/solr/tree/servlet-solr
> >> >>>
> >> >>> Ignore the AdminServlet class, it's a placeholder, and can be
> subtracted without harm.
> >> >>>
> >> >>> Since the current state of the code in that branch is apparently
> test-stable (4 runs of check in a row passing, none slower than any run of
> 3 runs of main, both as low as 10.5 min if I don't continue working on the
> machine)...
> >> >>>
> >> >>> Do we want to push this refactor in now to avoid making a huge ball
> of changes that gets harder and harder to merge? The next push point would
> probably be when AdminServlet was functional (if/when that happens) (and we
> could not push that class for now).
> >> >>>
> >> >>> If you read this far, thanks :) I wasn't sure how feasible this
> would be so I felt the need to prove it to my self in code before wasting
> your time, but please don't hesitate to point to costs I might be missing
> or anything that looks like a mistake, or say why this was a total waste of
> time :)
> >> >>>
> >> >>> Thoughts?
> >> >>>
> >> >>> -Gus
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>> --
> >> >>> - Mark
> >> >>>
> >> >>> http://about.me/markrmiller
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> http://www.needhamsoftware.com (work)
> >> >>> http://www.the111shift.com (play)
> >> >>>
> >> >>> --
> >> >>> - Mark
> >> >>>
> >> >>> http://about.me/markrmiller
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> http://www.needhamsoftware.com (work)
> >> >> http://www.the111shift.com (play)
> >> >
> >> > --
> >> > - Mark
> >> >
> >> > http://about.me/markrmiller
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> >> For additional commands, e-mail: dev-help@solr.apache.org
> >>
> > --
> > - Mark
> >
> > http://about.me/markrmiller
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Jason Gerlowski <ge...@gmail.com>.
I don't know enough about Jetty or startup times or webapp packaging
options to respond to much of the later discussion on this thread.

But I do know how hard HttpSolrCall and SolrDispatchFilter are to
understand and how painful they've been to maintain.  They've slowed
my own development down in the past and have been a source of at least
a few bugs (I'm remembering some auth-framework bugs in particular,
though I'm sure there have been others.)  Anyway, speaking from that
limited viewpoint, I'd love to see the work in Gus' PR land.  Thanks
for the weekend-experiment man!

Jason

On Thu, Aug 19, 2021 at 12:57 PM Mark Miller <ma...@gmail.com> wrote:
>
> You could say that about a 1000 of the things I could bring it up. I’m sure I’ve said it.
>
> So I what I did was eliminate the big discussions comment. And ripped through thousands. So IMO, there you go. A nice chilling affect that has little to do with nuts and bolts software engineering seizing up the majority of developers instincts.
>
> That and the fact that it’s hard to see those little bit of work paths. They are covered and obscured by weeds and thorns. But they are there.
>
> My schedule is full of stuff that has nothing to do with these things. I’m not reopening any discussions myself. But I will match good work.
>
> If someone thinks QuickStart or something along the lines of Uwe’s design is worth tackling - that startup performance matters - I’ll add an option to eat the next bottleneck - dynamically and concurrently loading the 10-15k classes. If someone makes a half way sensible rest api on the server - I’ll work on a much stronger http2 api.
>
> The technical work is not hard. The other shit is maybe not that hard. I never found it hard. But I gave it up. And before I did, I told a couple, look at ethereum. Look at the love meetings they do. Look at the email interactions. Look at the conference. Look at the abstract common goals and agreement. Look at the in the ground disfunction and disagreement. And maybe try to do something.
>
> MRM
>
> On Thu, Aug 19, 2021 at 8:06 AM Alexandre Rafalovitch <ar...@gmail.com> wrote:
>>
>> > But a little more effort on those json files that make up V2
>> I looked into that and during the in-depth testing realised that V2
>> did not support default values (it writes out current values only),
>> nor did it have a proper round-trip story, especially for the nested
>> configurations. So, it was - architecturally - not at parity with the
>> way Solr did configuration.
>>
>> I documented my findings in a bunch of JIRAs with the umbrella in
>> SOLR-14795 but stopped after that, as I could not figure out the way
>> to do that "little more effort" without reopening big discussions.
>>
>> Not saying it is not possible (or to derail the parent conversation),
>> just my two cents on attempting that kind of "a small amount of
>> effort".
>>
>> Regards,
>>    Alex.
>>
>> On Thu, 19 Aug 2021 at 02:05, Mark Miller <ma...@gmail.com> wrote:
>> >
>> > The gap is not ideal and I’m not advocating for it, but it does help keep you honest. There is gap between a mini cluster and a packed dist regardless and it’s really something that should be covered by testing even with more of a lineup.
>> >
>> > I’m very into dependency injection in Solr. I was adding juice at the end of branch 2.5. The benefit to testing is well known, but wasn’t even looking at that.
>> >
>> > Our class dependencies are a major hindrance. From everywhere, you need access to the CoreContainer. And ZkController. And then make chained calls all over the place to reach back and forth into each other.
>> >
>> > Need a zksolrclient? Start chaining method calls until you find it. ZkstateReader? Pretty much anything. You find a friendly root and start reaching back and forth. It’s a very verbose and complicated framework. And with issues depending on the class and implementation.  Because threads are started in constructors and everything is so interdependent, you actually race spinning the wild up.
>> >
>> > Getting the construction, and lifecycle and access to all the core widely used classes is delicate work. And is essentially given to everyone, everywhere.
>> >
>> > So I built everything at the top, in a guice configuration. Lifecycle and construction and dependencies mostly collocated in the code in one spot. Classes that need access get those objects injected. No looking for, chaining for, constructing a second or third instance for …
>> >
>> > And then what I was starting on was, because we have all these complex objects and difficult to understand lifecycle and access, and in many many places - mistakes are made (Someone closes something they shouldn’t. Makes a call they are not supposed to) And so I started working on having most of the key, long lived objects managed by a small core layer of code. And the objects that got injected where getting specialized - only the core lifecycle code can close a zksolrclient. The method doesn’t exist or throws and exception for the request code. Requests can access the resources - easily discoverable, and without the ability to mess up with the unnecessary power they are given.
>> >
>> > So just like api V2, you can find a hint of this in the dual api for zk access. And like V2, half baked, undocumented. Team tackled by like one, for a while, to some limited extent. And so again you have an independent idea built on a gold nugget that ends up increasing complexity and variation and surface area and ability to understand. The core of the code always large enough to barrel along in the same direction while some independent, time limited efforts, start pulling the mass wider in objectively reasonable goals. But it’s just a wider mass barreling forward.
>> >
>> > Essentially to say, yes. Dependency injection.  You can shrink the world. Give the average developer half a chance.
>> >
>> > Request dispatching and the rest api should be the same deal. Out of the way. Done with a focus to shrink the world and remove the power and let search developers focus on search. I look at the V2 api and think, man, it’s like someone saw gold and then went after a nickel. We shouldn’t be in the rest api and dispatching business.
>> >
>> > Let’s say you implement an OpenApi or SwaggerApi. You get tools. Web, ide, cmd line.   You get code generation. Server and client. Editors. You get automated documentation creation. You get a standard and compatibility that’s handled by a huge community dedicated to it. When devs have to work on it, it’s simple, it’s documented and tutorialed and talked about - outside of the project and the one or two hobbiests that have taken a passing interest in a rest apia and dispatching.
>> >
>> > And V2 seems to look at that and take a bite or two. But leave almost eveything on the table. A lot of code. Very non standard custom, undocumented code. Non of the tool support. Pretty heavy work to even unwind it with few  pointers to help.  Still no standard. Self documenting at a very low level. And with very little of the benefit. When you follow these specs, not only is most of the work down for you beyond specification, but your ide auto completes your urls. Tools auto mock testing. A whole bunch of crap pushed away.
>> >
>> > The stuff that Solr is actually good at is the core of the actual search functionality. But the amount of out of scope and poor code surrounding that is what drags in the whole system. And most of it can be kicked right out of the project and into code and specs solved and maintained by people that actually do that work for a living.
>> >
>> > Here and there we have some special needs. No doubt. But a little more effort on those json files that make up V2? And at the push of a button you have the basic working code for a full client. For a jetty server implementation. And very pretty, full, exampled, generated documentation. Just a click and a small amount of effort. Even if you say, while the search api, and some of the performance and special case stuff… okay, okay, I think there’s likely a half way point, but let’s just say you take the whole admin api off the table. Beautiful. At least the rest is search related.
>> >
>> >
>> > MRM
>> >
>> > On Wed, Aug 18, 2021 at 11:51 AM Gus Heck <gu...@gmail.com> wrote:
>> >>
>> >> The thing that concerned me most about JettySolrRunner is that it differed from what we use in production essentially leaving a gap of untested functionality, so if we move one or the other to match I'm pretty much happy. The thing it (and ServeletContextHandler) don't do is start anything else without instructions, which is the good and bad. I've not had any microservice oriented clients so startup time hasn't been a thing I thought much about but all the points above make sense in that light.
>> >>
>> >> Since SIP-6 doesn't contemplate anything like replacing jetty/servlets entirely I don't think anything I did here stands in its way, just a slightly longer list of things to instantiate (conveniently documented in web.xml) but code review would be a happy thing in case I missed something on that front.
>> >>
>> >> Even what Uwe describes seems compatible, and however we go it would be MUCH better to have the ability to inject dependencies into the things we start. Because of the unit tests we can't ever use any singleton that has state, so that's out and threadlocals are too broad, so without control to inject I was forced to set up a delicate dance with a weak map keyed on the servlet contexts which is not my favorite thing, but it worked.
>> >>
>> >> -Gus
>> >>
>> >> On Wed, Aug 18, 2021 at 10:03 AM Jan Høydahl <ja...@cominvent.com> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Clearly a refactor is needed here, thanks Gus for diving in deep.
>> >>>
>> >>> Like Uwe I don't want us to be more dependent upon web.xml or Jetty's startup logic. That's why I filed SIP-6 (https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process) to try to describe the vision that Uwe just described. If such a bootstrap redesign makes the dispatch cleanup easier or harder I don't know. Perhaps one should be done before the other? Or shuold be viewed as part of the same rewrite?
>> >>>
>> >>> Jan
>> >>>
>> >>> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Thanks Mark, I fully agree with you on this: Booting up jetty with start.jar and using a web application is dramatically slower.
>> >>>
>> >>> Nevertheless, I think that we should go away with the web application alltogether. We can still stay with the servlet API, but just – as in Gus patch – to split responsibilities and go away with a singleton ServletFilter, although from my own experience, I figured out that Servletapis’s URI handling is far away from ideal. In most cases at end I always landed in a separate “master” servlet (hooked to “/*”) that only parses the PathInfo with regular expressions (one thing that does not work with servlet api) and then delegates using RequestDispatcher to several more specific servlets or returns 404 not found. This setup would also help to split the huge DispatchFilter mess, but still the URI parsing logic would be centralized. But this servlet would only take care of the dispatching, so DispatchServlet would be fine. But it should only dispatch, nothing else.
>> >>>
>> >>> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to the main distribution and remove start.jar completely. Also remove jetty-webapp.jar and related dependencies. In Boostrap just start Jetty embedded, add listen ip, port numbers, etc. We can still use the config API of Jetty for this, so maybe put this into an XML file (although XML is a bad format for this stuff, goes back to web applications). Then Bootsrap starts the Corecontainer and the embedded Zookeeper (this is another horrible thing in Solr: the embedded zookeper starts somehow as part of the web application) and hooks the servlets into a (at first) single root ServletContextHandler. This is upside down. So Solr starts with a Bootstrap class, starts zookeeper, jetty and hands over to CoreContainer.
>> >>>
>> >>> Some other ideas (not sure if they are already implemented in the patch): Maybe we should install a separate ServletContextHandler/Classloader for each Core. This would allow us to go away with our own classloader magic. Bootstrap could also do this: It starts the CoreContainer, Zookeeper, Jetty Installs a servlet for the admin UI and the Core/Collection management and then hands over to the CoreContainer to initialize all Cores. Each CoreContainer gets his own ServletContextHandler and can therefore easily be loaded unloaded from Jetty.
>> >>>
>> >>> I wrote many other apps in my daily live like this, the startup time (especially for microservices) is great. A Jetty starting up in milliseconds and going to service is unbeatable.
>> >>>
>> >>> Uwe
>> >>>
>> >>> -----
>> >>> Uwe Schindler
>> >>> Achterdiek 19, D-28357 Bremen
>> >>> https://www.thetaphi.de
>> >>> eMail: uwe@thetaphi.de
>> >>>
>> >>> From: Mark Miller <ma...@gmail.com>
>> >>> Sent: Wednesday, August 18, 2021 9:08 AM
>> >>> To: Gus Heck <gu...@gmail.com>
>> >>> Cc: dev@solr.apache.org
>> >>> Subject: Re: Solr and Jetty and Servlets
>> >>>
>> >>> Agreement or not, I wouldn’t step in front of whatever you did.
>> >>>
>> >>> I doubt anyone is going to take on actually pulling out of a web app because it’s easier to lean on it for configuration and I doubt start up speed is a very high priority for most.
>> >>>
>> >>> It had never really occurred to me in the past that JettySolrRunner was so much faster. I just know it because of driving the test down to ridiculously times, starting up and stopping mini clusters of whatever size in literally no time. With no QuickStart setup for embedded Jetty.
>> >>>
>> >>> So than starting up standalone Solr, it became apparent how much slower it was. So I started poking around. Cloud guys like those at google had already been feeling pain - preferring to be able to spin up and down relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s thing how they were dealing with it. And it’s simple, but it’s still easily non comparable at the end of the day. Solr ships with like 10,000 classes across a ton of jars, even with QuickStart, you have to follow the spec of what to do and what might be done by any given webapp - though as JettySolrRunner shows, we don’t actually need any of it. QuickStart is trying to find every trick to speed up following the container and webapp spec. Injecting the dispatch filter just tosses it all out :)
>> >>>
>> >>> Any, may point was not that I disagree and you should make Solr inject a dispatch filter - my point is just that it doesn’t really matter to dispatch. You can do just as much via Jetty either way.
>> >>>
>> >>> And anyone that doesn’t agree about “something should be done” is either lying or not looking. So I wouldn’t worry about agreement. The problem is that it’s quicksand that people are avoiding, not that anyone that looks can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted to work towards getting out of that hole. But at the end of the day, it’s now part of the hole. Made it deeper. Bermuda Triangle. But look at it - there was hope there. Hope that one day, dispatching would be better. Rather than twice as complicated. I don’t know that hope lives on. But no one gonna stop you on reasonable technical sense with any proper license.
>> >>>
>> >>> Mark
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> *ServletContextHandler not ServletContext
>> >>>
>> >>> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> Ok nope, since that only happens with WebAppContext, not ServletContext (the parent class). whew :)
>> >>>
>> >>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and thus lack of metadata-complete=true in web.xml means that we wind up scanning for annotations?
>> >>>
>> >>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> https://issues.apache.org/jira/browse/SOLR-15590
>> >>>
>> >>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> Ok well it would be interesting to compare quickstarting vs JettySolrRunner, and reading up on quickstart gives ideas about pre-building the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either way we are still in a container and I think I hear some agreement that something should be done about the dispatch here, and both of you seem to agree that an actual servlet would make more sense than a filter, so I'll make a ticket/pr to make it easier to track & easier to read the code.
>> >>>
>> >>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com> wrote:
>> >>>
>> >>> The downside to respecting web.xml and making JettySolrServer serve a webapp is that loading a webapp is very expensive and slow in comparison.  JettySolrServer actually starts up extremely quickly. It’s almost more appealing to change the Server to use the JettySolrServer strategy. It’s so slow to load a webapp because of all the things it needs to support and scan jars for - in a kind of JavaEE situation, though not as bad. Jetty QuickStart does improve the situation if used at least. But there is really no need to eat the whole webapp standard for a non webapp app to have Jetty manage more of the dispatching. I always wondering about the motivation / upside of hacking in a straight servlet myself. But for a non webapp stuck in a servlet container, it’s actually a beautiful move that side steps a bunch of slow crap even better than the QuickStart pushes ever will, and still allows for matching any support we would want.
>> >>>
>> >>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I made a slim SolrCall base class that each extends. All that logic is hairy enough to follow without them interleaving and sharing in a kind of wild collaboration. It’s pretty unfortunate they both still exist.
>> >>>
>> >>> You are right, leaving that path in that state is a poor idea. If there was anything that could be considered the “hot” path, it’s right there - feverishly checking and ripping up streams for anyone and anything. Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path down? Do a dance? Treat the string like a date and see if that works.
>> >>>
>> >>> So yeah, agreed, nobody does or would dispatch this way, you have to frog boil into it. It’s slow, almost incomprehensible, and incredibly good at holding onto life, even building on it. But being a webapp and parsing web.xml has little to do with dispatching and having sensible http api that doesn’t work like it’s a perl compiler.
>> >>>
>> >>> Anyway, I can’t imagine trying to trace through that code with any solid feeling about having a handle on it via inspection. But just like Perl, if you debug / trace log the hell out of it, it is all actually pretty simple to adjust and reign in.
>> >>>
>> >>>
>> >>> MRM
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> So I'm not interested in *adhering* to anything here, just using stuff where it helps... I see tools sitting on the shelf, already bought and paid for and carried with us to every job but then they sit in the back of the truck mostly unused... and (I think) they look useful.
>> >>>
>> >>> This should not be seen as in any way advocating a return to war file deployment or "choose your container". All of what I would suggest should likely be compatible with a jetty startup controlled by our code (I assume we can convince it to read web.xml properly when doing so). Certainly point out anything that would interfere with that.
>> >>>
>> >>> Here we sit in a container that is a tool box containing:
>> >>>
>> >>> A nice mechanism for initializing stuff at the application level, (ServletContextListener) and a well defined guarantee that it runs before any filter or servlet,
>> >>> An automatic shutdown mechanism that it will call for us on graceful shutdown (ServletContextListner again).
>> >>> A nice layered filter chain mechanism which could have been used to layer in things like tracing and authentication, and close shields etc as small succinct filters rather than weaving them into an ever more complex filter & call class.
>> >>> In more recent versions of the spec, for listeners defined in web.xml the order is also guaranteed.
>> >>> Servlet classes that are *already* set up to distinguish http verbs automatically when desired
>> >>>
>> >>> So why is this better? Because monster classes that do a hundred things are really hard to understand and maintain. Small methods, small classes whenever possible. I also suspect that there may be some gains in performance to be had if we rely more on the container (which will already be dispatching based on path) to choose our code paths (at least at a coarse level) and then have less custom dispatch logic executed on *every* request
>> >>>
>> >>> Obviously I'm wrong and if the net result is less performant to any significant degree forget it that wouldn't be worth it. (wanted: standardized solr benchmarks)
>> >>>
>> >>> There WILL be complications with v2 because it is a subclass of HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it should be a separate servlet from v1, but we don't want to duplicate code either... so work to do there...
>> >>>
>> >>> I think an incremental approach is necessary since very few of us have the bandwidth to more than that and certainly it becomes difficult to find anyone with the time to review large changes. What I have thus far is stable with respect to the tests, and simplifies some stuff already which is why I chose this point to start the discussion
>> >>>
>> >>> But yeah, look at what I did and say what you like and what you don't. :).
>> >>>
>> >>> -Gus
>> >>>
>> >>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>> >>>
>> >>> Sounds like an interesting adventure last weekend.
>> >>>
>> >>> I'm unclear what the point of going this direction is; my instinct is to go the opposite direction.  You seem to suggest there are some simplification/organization benefits, which I love, so I'll need to look at what you've done to judge that for myself.  Yes Jetty supports the Servlet spec but we need not embrace it.  Adhering to that is useful if you have a generic web app that can be deployed to a container of the user's convenience/choosing.  No doubt this is why Solr started this way, and why the apps I built in my early days adhered to that spec.  But since 6.0, we view Solr as self-contained and more supportable if the project makes these decisions and thus not needlessly constrain itself as well.
>> >>>
>> >>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and not a *Servlet* itself.
>> >>>
>> >>> Also, I suspect there may be complications in changes here relating to Solr's v1 vs v2 API.  And most definitely also what you discovered -- JettySolrRunner.
>> >>>
>> >>> ~ David Smiley
>> >>> Apache Lucene/Solr Search Developer
>> >>> http://www.linkedin.com/in/davidwsmiley
>> >>>
>> >>>
>> >>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>> >>>
>> >>> TLDR: I've got a working branch where CoreContainer & our startup process is extracted from SolrDispatch Filter. Do other folks think that is interesting enough that I should make a JIRA and/or PR with intention to make this change?
>> >>>
>> >>> Details:
>> >>>
>> >>> Jetty is a servlet container, yet we more or less ignore it by stuffing everything into a single filter (almost, admin UI is served separately). I'm sure there are lots of historical reasons for this, probably including servlet containers and their specs were much less mature when solr was first started. Maybe also the early authors were more focused on making search work than leveraging what the container could do for them (but I wasn't there for that so that's just a guess).
>> >>>
>> >>> The result is that we have a couple of very large classes that are touched by almost every request, and they have a lot of conditional logic trying to decide what the user is asking. Normally this sort of dispatch is done by the container based on the request URL to direct it to the appropriate servlet. Solr does a LOT of different things so this code is extremely tricky and complex to understand. Specifically, I'm speaking of SolrDispatchFilter and HttpSolrCall, which are so inseparable that being two classes probably makes them harder to understand.
>> >>>
>> >>> It seems to me (and this mail is asking if you agree) that these classes are long overdue for some subdivision. The most obvious thing to pull out is all the admin calls. Admin code paths really have little or nothing to do with query or update code paths since there are no documents to route or sub-requests to some subset of nodes.
>> >>>
>> >>> The primary obstacle to any such separation and simplification is that most requests have some interaction with CoreContainer, and the things it holds, and this is initialized and held by a field in SolrDispatchFilter. After spending a significant chunk of time reading this code in the prior weeks and a timely and motivating conversation with Eric Pugh, I dumped a chunk of my weekend into an experiment to see if I could pull CoreContainer out of the dispatch filter, and leverage the facilities of our servlet container.
>> >>>
>> >>> That wasn't too terribly hard, but keeping JettySolrRunner happy was very confusing, and worrisome since I've realized it's not respecting our web.xml at all, and any configuration in web.xml needs to be duplicated for our tests in JettySolrRunner (tangent alert)
>> >>>
>> >>> The result is that CoreContainer is now held by a class called CoreService (please help me name things if you don't like my names :) ). CoreService is a ServletContextListener, appropriately configured in web.xml, and has a static method that can be used to get a reference to the CoreContainer corresponding to the ServletContext in which code wanting a core container is running (this supports having multiple JettySolrRunners in tests, though probably never has more than one CoreContainer in the running application)
>> >>>
>> >>> I achieved this in 4 stages shown here: https://github.com/gus-asf/solr/tree/servlet-solr
>> >>>
>> >>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted without harm.
>> >>>
>> >>> Since the current state of the code in that branch is apparently test-stable (4 runs of check in a row passing, none slower than any run of 3 runs of main, both as low as 10.5 min if I don't continue working on the machine)...
>> >>>
>> >>> Do we want to push this refactor in now to avoid making a huge ball of changes that gets harder and harder to merge? The next push point would probably be when AdminServlet was functional (if/when that happens) (and we could not push that class for now).
>> >>>
>> >>> If you read this far, thanks :) I wasn't sure how feasible this would be so I felt the need to prove it to my self in code before wasting your time, but please don't hesitate to point to costs I might be missing or anything that looks like a mistake, or say why this was a total waste of time :)
>> >>>
>> >>> Thoughts?
>> >>>
>> >>> -Gus
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>> --
>> >>> - Mark
>> >>>
>> >>> http://about.me/markrmiller
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.needhamsoftware.com (work)
>> >>> http://www.the111shift.com (play)
>> >>>
>> >>> --
>> >>> - Mark
>> >>>
>> >>> http://about.me/markrmiller
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> http://www.needhamsoftware.com (work)
>> >> http://www.the111shift.com (play)
>> >
>> > --
>> > - Mark
>> >
>> > http://about.me/markrmiller
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
> --
> - Mark
>
> http://about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org


Re: Solr and Jetty and Servlets

Posted by Mark Miller <ma...@gmail.com>.
You could say that about a 1000 of the things I could bring it up. I’m sure
I’ve said it.

So I what I did was eliminate the big discussions comment. And ripped
through thousands. So IMO, there you go. A nice chilling affect that has
little to do with nuts and bolts software engineering seizing up the
majority of developers instincts.

That and the fact that it’s hard to see those little bit of work paths.
They are covered and obscured by weeds and thorns. But they are there.

My schedule is full of stuff that has nothing to do with these things. I’m
not reopening any discussions myself. But I will match good work.

If someone thinks QuickStart or something along the lines of Uwe’s design
is worth tackling - that startup performance matters - I’ll add an option
to eat the next bottleneck - dynamically and concurrently loading the
10-15k classes. If someone makes a half way sensible rest api on the server
- I’ll work on a much stronger http2 api.

The technical work is not hard. The other shit is maybe not that hard. I
never found it hard. But I gave it up. And before I did, I told a couple,
look at ethereum. Look at the love meetings they do. Look at the email
interactions. Look at the conference. Look at the abstract common goals and
agreement. Look at the in the ground disfunction and disagreement. And
maybe try to do something.

MRM

On Thu, Aug 19, 2021 at 8:06 AM Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> > But a little more effort on those json files that make up V2
> I looked into that and during the in-depth testing realised that V2
> did not support default values (it writes out current values only),
> nor did it have a proper round-trip story, especially for the nested
> configurations. So, it was - architecturally - not at parity with the
> way Solr did configuration.
>
> I documented my findings in a bunch of JIRAs with the umbrella in
> SOLR-14795 but stopped after that, as I could not figure out the way
> to do that "little more effort" without reopening big discussions.
>
> Not saying it is not possible (or to derail the parent conversation),
> just my two cents on attempting that kind of "a small amount of
> effort".
>
> Regards,
>    Alex.
>
> On Thu, 19 Aug 2021 at 02:05, Mark Miller <ma...@gmail.com> wrote:
> >
> > The gap is not ideal and I’m not advocating for it, but it does help
> keep you honest. There is gap between a mini cluster and a packed dist
> regardless and it’s really something that should be covered by testing even
> with more of a lineup.
> >
> > I’m very into dependency injection in Solr. I was adding juice at the
> end of branch 2.5. The benefit to testing is well known, but wasn’t even
> looking at that.
> >
> > Our class dependencies are a major hindrance. From everywhere, you need
> access to the CoreContainer. And ZkController. And then make chained calls
> all over the place to reach back and forth into each other.
> >
> > Need a zksolrclient? Start chaining method calls until you find it.
> ZkstateReader? Pretty much anything. You find a friendly root and start
> reaching back and forth. It’s a very verbose and complicated framework. And
> with issues depending on the class and implementation.  Because threads are
> started in constructors and everything is so interdependent, you actually
> race spinning the wild up.
> >
> > Getting the construction, and lifecycle and access to all the core
> widely used classes is delicate work. And is essentially given to everyone,
> everywhere.
> >
> > So I built everything at the top, in a guice configuration. Lifecycle
> and construction and dependencies mostly collocated in the code in one
> spot. Classes that need access get those objects injected. No looking for,
> chaining for, constructing a second or third instance for …
> >
> > And then what I was starting on was, because we have all these complex
> objects and difficult to understand lifecycle and access, and in many many
> places - mistakes are made (Someone closes something they shouldn’t. Makes
> a call they are not supposed to) And so I started working on having most of
> the key, long lived objects managed by a small core layer of code. And the
> objects that got injected where getting specialized - only the core
> lifecycle code can close a zksolrclient. The method doesn’t exist or throws
> and exception for the request code. Requests can access the resources -
> easily discoverable, and without the ability to mess up with the
> unnecessary power they are given.
> >
> > So just like api V2, you can find a hint of this in the dual api for zk
> access. And like V2, half baked, undocumented. Team tackled by like one,
> for a while, to some limited extent. And so again you have an independent
> idea built on a gold nugget that ends up increasing complexity and
> variation and surface area and ability to understand. The core of the code
> always large enough to barrel along in the same direction while some
> independent, time limited efforts, start pulling the mass wider in
> objectively reasonable goals. But it’s just a wider mass barreling forward.
> >
> > Essentially to say, yes. Dependency injection.  You can shrink the
> world. Give the average developer half a chance.
> >
> > Request dispatching and the rest api should be the same deal. Out of the
> way. Done with a focus to shrink the world and remove the power and let
> search developers focus on search. I look at the V2 api and think, man,
> it’s like someone saw gold and then went after a nickel. We shouldn’t be in
> the rest api and dispatching business.
> >
> > Let’s say you implement an OpenApi or SwaggerApi. You get tools. Web,
> ide, cmd line.   You get code generation. Server and client. Editors. You
> get automated documentation creation. You get a standard and compatibility
> that’s handled by a huge community dedicated to it. When devs have to work
> on it, it’s simple, it’s documented and tutorialed and talked about -
> outside of the project and the one or two hobbiests that have taken a
> passing interest in a rest apia and dispatching.
> >
> > And V2 seems to look at that and take a bite or two. But leave almost
> eveything on the table. A lot of code. Very non standard custom,
> undocumented code. Non of the tool support. Pretty heavy work to even
> unwind it with few  pointers to help.  Still no standard. Self documenting
> at a very low level. And with very little of the benefit. When you follow
> these specs, not only is most of the work down for you beyond
> specification, but your ide auto completes your urls. Tools auto mock
> testing. A whole bunch of crap pushed away.
> >
> > The stuff that Solr is actually good at is the core of the actual search
> functionality. But the amount of out of scope and poor code surrounding
> that is what drags in the whole system. And most of it can be kicked right
> out of the project and into code and specs solved and maintained by people
> that actually do that work for a living.
> >
> > Here and there we have some special needs. No doubt. But a little more
> effort on those json files that make up V2? And at the push of a button you
> have the basic working code for a full client. For a jetty server
> implementation. And very pretty, full, exampled, generated documentation.
> Just a click and a small amount of effort. Even if you say, while the
> search api, and some of the performance and special case stuff… okay, okay,
> I think there’s likely a half way point, but let’s just say you take the
> whole admin api off the table. Beautiful. At least the rest is search
> related.
> >
> >
> > MRM
> >
> > On Wed, Aug 18, 2021 at 11:51 AM Gus Heck <gu...@gmail.com> wrote:
> >>
> >> The thing that concerned me most about JettySolrRunner is that it
> differed from what we use in production essentially leaving a gap of
> untested functionality, so if we move one or the other to match I'm pretty
> much happy. The thing it (and ServeletContextHandler) don't do is start
> anything else without instructions, which is the good and bad. I've not had
> any microservice oriented clients so startup time hasn't been a thing I
> thought much about but all the points above make sense in that light.
> >>
> >> Since SIP-6 doesn't contemplate anything like replacing jetty/servlets
> entirely I don't think anything I did here stands in its way, just a
> slightly longer list of things to instantiate (conveniently documented in
> web.xml) but code review would be a happy thing in case I missed something
> on that front.
> >>
> >> Even what Uwe describes seems compatible, and however we go it would be
> MUCH better to have the ability to inject dependencies into the things we
> start. Because of the unit tests we can't ever use any singleton that has
> state, so that's out and threadlocals are too broad, so without control to
> inject I was forced to set up a delicate dance with a weak map keyed on the
> servlet contexts which is not my favorite thing, but it worked.
> >>
> >> -Gus
> >>
> >> On Wed, Aug 18, 2021 at 10:03 AM Jan Høydahl <ja...@cominvent.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Clearly a refactor is needed here, thanks Gus for diving in deep.
> >>>
> >>> Like Uwe I don't want us to be more dependent upon web.xml or Jetty's
> startup logic. That's why I filed SIP-6 (
> https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process)
> to try to describe the vision that Uwe just described. If such a bootstrap
> redesign makes the dispatch cleanup easier or harder I don't know. Perhaps
> one should be done before the other? Or shuold be viewed as part of the
> same rewrite?
> >>>
> >>> Jan
> >>>
> >>> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
> >>>
> >>> Hi,
> >>>
> >>> Thanks Mark, I fully agree with you on this: Booting up jetty with
> start.jar and using a web application is dramatically slower.
> >>>
> >>> Nevertheless, I think that we should go away with the web application
> alltogether. We can still stay with the servlet API, but just – as in Gus
> patch – to split responsibilities and go away with a singleton
> ServletFilter, although from my own experience, I figured out that
> Servletapis’s URI handling is far away from ideal. In most cases at end I
> always landed in a separate “master” servlet (hooked to “/*”) that only
> parses the PathInfo with regular expressions (one thing that does not work
> with servlet api) and then delegates using RequestDispatcher to several
> more specific servlets or returns 404 not found. This setup would also help
> to split the huge DispatchFilter mess, but still the URI parsing logic
> would be centralized. But this servlet would only take care of the
> dispatching, so DispatchServlet would be fine. But it should only dispatch,
> nothing else.
> >>>
> >>> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to
> the main distribution and remove start.jar completely. Also remove
> jetty-webapp.jar and related dependencies. In Boostrap just start Jetty
> embedded, add listen ip, port numbers, etc. We can still use the config API
> of Jetty for this, so maybe put this into an XML file (although XML is a
> bad format for this stuff, goes back to web applications). Then Bootsrap
> starts the Corecontainer and the embedded Zookeeper (this is another
> horrible thing in Solr: the embedded zookeper starts somehow as part of the
> web application) and hooks the servlets into a (at first) single root
> ServletContextHandler. This is upside down. So Solr starts with a Bootstrap
> class, starts zookeeper, jetty and hands over to CoreContainer.
> >>>
> >>> Some other ideas (not sure if they are already implemented in the
> patch): Maybe we should install a separate
> ServletContextHandler/Classloader for each Core. This would allow us to go
> away with our own classloader magic. Bootstrap could also do this: It
> starts the CoreContainer, Zookeeper, Jetty Installs a servlet for the admin
> UI and the Core/Collection management and then hands over to the
> CoreContainer to initialize all Cores. Each CoreContainer gets his own
> ServletContextHandler and can therefore easily be loaded unloaded from
> Jetty.
> >>>
> >>> I wrote many other apps in my daily live like this, the startup time
> (especially for microservices) is great. A Jetty starting up in
> milliseconds and going
> <https://www.google.com/maps/search/up+in+milliseconds+and+going+?entry=gmail&source=g>to
> service is unbeatable.
> >>>
> >>> Uwe
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> Achterdiek 19, D-28357 Bremen
> >>> https://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >>>
> >>> From: Mark Miller <ma...@gmail.com>
> >>> Sent: Wednesday, August 18, 2021 9:08 AM
> >>> To: Gus Heck <gu...@gmail.com>
> >>> Cc: dev@solr.apache.org
> >>> Subject: Re: Solr and Jetty and Servlets
> >>>
> >>> Agreement or not, I wouldn’t step in front of whatever you did.
> >>>
> >>> I doubt anyone is going to take on actually pulling out of a web app
> because it’s easier to lean on it for configuration and I doubt start up
> speed is a very high priority for most.
> >>>
> >>> It had never really occurred to me in the past that JettySolrRunner
> was so much faster. I just know it because of driving the test down to
> ridiculously times, starting up and stopping mini clusters of whatever size
> in literally no time. With no QuickStart setup for embedded Jetty.
> >>>
> >>> So than starting up standalone Solr, it became apparent how much
> slower it was. So I started poking around. Cloud guys like those at google
> had already been feeling pain - preferring to be able to spin up and down
> relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s
> thing how they were dealing with it. And it’s simple, but it’s still easily
> non comparable at the end of the day. Solr ships with like 10,000 classes
> across a ton of jars, even with QuickStart, you have to follow the spec of
> what to do and what might be done by any given webapp - though as
> JettySolrRunner shows, we don’t actually need any of it. QuickStart is
> trying to find every trick to speed up following the container and webapp
> spec. Injecting the dispatch filter just tosses it all out :)
> >>>
> >>> Any, may point was not that I disagree and you should make Solr inject
> a dispatch filter - my point is just that it doesn’t really matter to
> dispatch. You can do just as much via Jetty either way.
> >>>
> >>> And anyone that doesn’t agree about “something should be done” is
> either lying or not looking. So I wouldn’t worry about agreement. The
> problem is that it’s quicksand that people are avoiding, not that anyone
> that looks can’t see it a huge rats nest. Look at the V2 API. Someone
> actually wanted to work towards getting out of that hole. But at the end of
> the day, it’s now part of the hole. Made it deeper. Bermuda Triangle. But
> look at it - there was hope there. Hope that one day, dispatching would be
> better. Rather than twice as complicated. I don’t know that hope lives on.
> But no one gonna stop you on reasonable technical sense with any proper
> license.
> >>>
> >>> Mark
> >>>
> >>>
> >>>
> >>> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> *ServletContextHandler not ServletContext
> >>>
> >>> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> Ok nope, since that only happens with WebAppContext, not
> ServletContext (the parent class). whew :)
> >>>
> >>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
> thus lack of metadata-complete=true in web.xml means that we wind up
> scanning for annotations?
> >>>
> >>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> https://issues.apache.org/jira/browse/SOLR-15590
> >>>
> >>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> Ok well it would be interesting to compare quickstarting vs
> JettySolrRunner, and reading up on quickstart gives ideas about
> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
> tangent. Either way we are still in a container and I think I hear some
> agreement that something should be done about the dispatch here, and both
> of you seem to agree that an actual servlet would make more sense than a
> filter, so I'll make a ticket/pr to make it easier to track & easier to
> read the code.
> >>>
> >>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
> wrote:
> >>>
> >>> The downside to respecting web.xml and making JettySolrServer serve a
> webapp is that loading a webapp is very expensive and slow in comparison.
> JettySolrServer actually starts up extremely quickly. It’s almost more
> appealing to change the Server to use the JettySolrServer strategy. It’s so
> slow to load a webapp because of all the things it needs to support and
> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
> QuickStart does improve the situation if used at least. But there is really
> no need to eat the whole webapp standard for a non webapp app to have Jetty
> manage more of the dispatching. I always wondering about the motivation /
> upside of hacking in a straight servlet myself. But for a non webapp stuck
> in a servlet container, it’s actually a beautiful move that side steps a
> bunch of slow crap even better than the QuickStart pushes ever will, and
> still allows for matching any support we would want.
> >>>
> >>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
> made a slim SolrCall base class that each extends. All that logic is hairy
> enough to follow without them interleaving and sharing in a kind of wild
> collaboration. It’s pretty unfortunate they both still exist.
> >>>
> >>> You are right, leaving that path in that state is a poor idea. If
> there was anything that could be considered the “hot” path, it’s right
> there - feverishly checking and ripping up streams for anyone and anything.
> Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the
> path down? Do a dance? Treat the string like a date and see if that works.
> >>>
> >>> So yeah, agreed, nobody does or would dispatch this way, you have to
> frog boil into it. It’s slow, almost incomprehensible, and incredibly good
> at holding onto life, even building on it. But being a webapp and parsing
> web.xml has little to do with dispatching and having sensible http api that
> doesn’t work like it’s a perl compiler.
> >>>
> >>> Anyway, I can’t imagine trying to trace through that code with any
> solid feeling about having a handle on it via inspection. But just like
> Perl, if you debug / trace log the hell out of it, it is all actually
> pretty simple to adjust and reign in.
> >>>
> >>>
> >>> MRM
> >>>
> >>>
> >>>
> >>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> So I'm not interested in *adhering* to anything here, just using stuff
> where it helps... I see tools sitting on the shelf, already bought and paid
> for and carried with us to every job but then they sit in the back of the
> truck mostly unused... and (I think) they look useful.
> >>>
> >>> This should not be seen as in any way advocating a return to war file
> deployment or "choose your container". All of what I would suggest should
> likely be compatible with a jetty startup controlled by our code (I assume
> we can convince it to read web.xml properly when doing so). Certainly point
> out anything that would interfere with that.
> >>>
> >>> Here we sit in a container that is a tool box containing:
> >>>
> >>> A nice mechanism for initializing stuff at the application level,
> (ServletContextListener) and a well defined guarantee that it runs before
> any filter or servlet,
> >>> An automatic shutdown mechanism that it will call for us on graceful
> shutdown (ServletContextListner again).
> >>> A nice layered filter chain mechanism which could have been used to
> layer in things like tracing and authentication, and close shields etc as
> small succinct filters rather than weaving them into an ever more complex
> filter & call class.
> >>> In more recent versions of the spec, for listeners defined in web.xml
> the order is also guaranteed.
> >>> Servlet classes that are *already* set up to distinguish http verbs
> automatically when desired
> >>>
> >>> So why is this better? Because monster classes that do a hundred
> things are really hard to understand and maintain. Small methods, small
> classes whenever possible. I also suspect that there may be some gains in
> performance to be had if we rely more on the container (which will already
> be dispatching based on path) to choose our code paths (at least at a
> coarse level) and then have less custom dispatch logic executed on *every*
> request
> >>>
> >>> Obviously I'm wrong and if the net result is less performant to any
> significant degree forget it that wouldn't be worth it. (wanted:
> standardized solr benchmarks)
> >>>
> >>> There WILL be complications with v2 because it is a subclass of
> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
> should be a separate servlet from v1, but we don't want to duplicate code
> either... so work to do there...
> >>>
> >>> I think an incremental approach is necessary since very few of us have
> the bandwidth to more than that and certainly it becomes difficult to find
> anyone with the time to review large changes. What I have thus far is
> stable with respect to the tests, and simplifies some stuff already which
> is why I chose this point to start the discussion
> >>>
> >>> But yeah, look at what I did and say what you like and what you don't.
> :).
> >>>
> >>> -Gus
> >>>
> >>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org>
> wrote:
> >>>
> >>> Sounds like an interesting adventure last weekend.
> >>>
> >>> I'm unclear what the point of going this direction is; my instinct is
> to go the opposite direction.  You seem to suggest there are some
> simplification/organization benefits, which I love, so I'll need to look at
> what you've done to judge that for myself.  Yes Jetty supports the Servlet
> spec but we need not embrace it.  Adhering to that is useful if you have a
> generic web app that can be deployed to a container of the user's
> convenience/choosing.  No doubt this is why Solr started this way, and why
> the apps I built in my early days adhered to that spec.  But since 6.0, we
> view Solr as self-contained and more supportable if the project makes these
> decisions and thus not needlessly constrain itself as well.
> >>>
> >>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter*
> and not a *Servlet* itself.
> >>>
> >>> Also, I suspect there may be complications in changes here relating to
> Solr's v1 vs v2 API.  And most definitely also what you discovered --
> JettySolrRunner.
> >>>
> >>> ~ David Smiley
> >>> Apache Lucene/Solr Search Developer
> >>> http://www.linkedin.com/in/davidwsmiley
> >>>
> >>>
> >>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
> >>>
> >>> TLDR: I've got a working branch where CoreContainer & our startup
> process is extracted from SolrDispatch Filter. Do other folks think that is
> interesting enough that I should make a JIRA and/or PR with intention to
> make this change?
> >>>
> >>> Details:
> >>>
> >>> Jetty is a servlet container, yet we more or less ignore it by
> stuffing everything into a single filter (almost, admin UI is served
> separately). I'm sure there are lots of historical reasons for this,
> probably including servlet containers and their specs were much less mature
> when solr was first started. Maybe also the early authors were more focused
> on making search work than leveraging what the container could do for them
> (but I wasn't there for that so that's just a guess).
> >>>
> >>> The result is that we have a couple of very large classes that are
> touched by almost every request, and they have a lot of conditional logic
> trying to decide what the user is asking. Normally this sort of dispatch is
> done by the container based on the request URL to direct it to the
> appropriate servlet. Solr does a LOT of different things so this code is
> extremely tricky and complex to understand. Specifically, I'm speaking of
> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
> two classes probably makes them harder to understand.
> >>>
> >>> It seems to me (and this mail is asking if you agree) that these
> classes are long overdue for some subdivision. The most obvious thing to
> pull out is all the admin calls. Admin code paths really have little or
> nothing to do with query or update code paths since there are no documents
> to route or sub-requests to some subset of nodes.
> >>>
> >>> The primary obstacle to any such separation and simplification is that
> most requests have some interaction with CoreContainer, and the things it
> holds, and this is initialized and held by a field in SolrDispatchFilter.
> After spending a significant chunk of time reading this code in the prior
> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
> chunk of my weekend into an experiment to see if I could pull CoreContainer
> out of the dispatch filter, and leverage the facilities of our servlet
> container.
> >>>
> >>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
> very confusing, and worrisome since I've realized it's not respecting our
> web.xml at all, and any configuration in web.xml needs to be duplicated for
> our tests in JettySolrRunner (tangent alert)
> >>>
> >>> The result is that CoreContainer is now held by a class called
> CoreService (please help me name things if you don't like my names :) ).
> CoreService is a ServletContextListener, appropriately configured in
> web.xml, and has a static method that can be used to get a reference to the
> CoreContainer corresponding to the ServletContext in which code wanting a
> core container is running (this supports having multiple JettySolrRunners
> in tests, though probably never has more than one CoreContainer in the
> running application)
> >>>
> >>> I achieved this in 4 stages shown here:
> https://github.com/gus-asf/solr/tree/servlet-solr
> >>>
> >>> Ignore the AdminServlet class, it's a placeholder, and can be
> subtracted without harm.
> >>>
> >>> Since the current state of the code in that branch is apparently
> test-stable (4 runs of check in a row passing, none slower than any run of
> 3 runs of main, both as low as 10.5 min if I don't continue working on the
> machine)...
> >>>
> >>> Do we want to push this refactor in now to avoid making a huge ball of
> changes that gets harder and harder to merge? The next push point would
> probably be when AdminServlet was functional (if/when that happens) (and we
> could not push that class for now).
> >>>
> >>> If you read this far, thanks :) I wasn't sure how feasible this would
> be so I felt the need to prove it to my self in code before wasting your
> time, but please don't hesitate to point to costs I might be missing or
> anything that looks like a mistake, or say why this was a total waste of
> time :)
> >>>
> >>> Thoughts?
> >>>
> >>> -Gus
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>> --
> >>> - Mark
> >>>
> >>> http://about.me/markrmiller
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>>
> >>>
> >>> --
> >>> http://www.needhamsoftware.com (work)
> >>> http://www.the111shift.com (play)
> >>>
> >>> --
> >>> - Mark
> >>>
> >>> http://about.me/markrmiller
> >>>
> >>>
> >>
> >>
> >> --
> >> http://www.needhamsoftware.com (work)
> >> http://www.the111shift.com (play)
> >
> > --
> > - Mark
> >
> > http://about.me/markrmiller
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
> --
- Mark

http://about.me/markrmiller

Re: Solr and Jetty and Servlets

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
> But a little more effort on those json files that make up V2
I looked into that and during the in-depth testing realised that V2
did not support default values (it writes out current values only),
nor did it have a proper round-trip story, especially for the nested
configurations. So, it was - architecturally - not at parity with the
way Solr did configuration.

I documented my findings in a bunch of JIRAs with the umbrella in
SOLR-14795 but stopped after that, as I could not figure out the way
to do that "little more effort" without reopening big discussions.

Not saying it is not possible (or to derail the parent conversation),
just my two cents on attempting that kind of "a small amount of
effort".

Regards,
   Alex.

On Thu, 19 Aug 2021 at 02:05, Mark Miller <ma...@gmail.com> wrote:
>
> The gap is not ideal and I’m not advocating for it, but it does help keep you honest. There is gap between a mini cluster and a packed dist regardless and it’s really something that should be covered by testing even with more of a lineup.
>
> I’m very into dependency injection in Solr. I was adding juice at the end of branch 2.5. The benefit to testing is well known, but wasn’t even looking at that.
>
> Our class dependencies are a major hindrance. From everywhere, you need access to the CoreContainer. And ZkController. And then make chained calls all over the place to reach back and forth into each other.
>
> Need a zksolrclient? Start chaining method calls until you find it. ZkstateReader? Pretty much anything. You find a friendly root and start reaching back and forth. It’s a very verbose and complicated framework. And with issues depending on the class and implementation.  Because threads are started in constructors and everything is so interdependent, you actually race spinning the wild up.
>
> Getting the construction, and lifecycle and access to all the core widely used classes is delicate work. And is essentially given to everyone, everywhere.
>
> So I built everything at the top, in a guice configuration. Lifecycle and construction and dependencies mostly collocated in the code in one spot. Classes that need access get those objects injected. No looking for, chaining for, constructing a second or third instance for …
>
> And then what I was starting on was, because we have all these complex objects and difficult to understand lifecycle and access, and in many many places - mistakes are made (Someone closes something they shouldn’t. Makes a call they are not supposed to) And so I started working on having most of the key, long lived objects managed by a small core layer of code. And the objects that got injected where getting specialized - only the core lifecycle code can close a zksolrclient. The method doesn’t exist or throws and exception for the request code. Requests can access the resources - easily discoverable, and without the ability to mess up with the unnecessary power they are given.
>
> So just like api V2, you can find a hint of this in the dual api for zk access. And like V2, half baked, undocumented. Team tackled by like one, for a while, to some limited extent. And so again you have an independent idea built on a gold nugget that ends up increasing complexity and variation and surface area and ability to understand. The core of the code always large enough to barrel along in the same direction while some independent, time limited efforts, start pulling the mass wider in objectively reasonable goals. But it’s just a wider mass barreling forward.
>
> Essentially to say, yes. Dependency injection.  You can shrink the world. Give the average developer half a chance.
>
> Request dispatching and the rest api should be the same deal. Out of the way. Done with a focus to shrink the world and remove the power and let search developers focus on search. I look at the V2 api and think, man, it’s like someone saw gold and then went after a nickel. We shouldn’t be in the rest api and dispatching business.
>
> Let’s say you implement an OpenApi or SwaggerApi. You get tools. Web, ide, cmd line.   You get code generation. Server and client. Editors. You get automated documentation creation. You get a standard and compatibility that’s handled by a huge community dedicated to it. When devs have to work on it, it’s simple, it’s documented and tutorialed and talked about - outside of the project and the one or two hobbiests that have taken a passing interest in a rest apia and dispatching.
>
> And V2 seems to look at that and take a bite or two. But leave almost eveything on the table. A lot of code. Very non standard custom, undocumented code. Non of the tool support. Pretty heavy work to even unwind it with few  pointers to help.  Still no standard. Self documenting at a very low level. And with very little of the benefit. When you follow these specs, not only is most of the work down for you beyond specification, but your ide auto completes your urls. Tools auto mock testing. A whole bunch of crap pushed away.
>
> The stuff that Solr is actually good at is the core of the actual search functionality. But the amount of out of scope and poor code surrounding that is what drags in the whole system. And most of it can be kicked right out of the project and into code and specs solved and maintained by people that actually do that work for a living.
>
> Here and there we have some special needs. No doubt. But a little more effort on those json files that make up V2? And at the push of a button you have the basic working code for a full client. For a jetty server implementation. And very pretty, full, exampled, generated documentation. Just a click and a small amount of effort. Even if you say, while the search api, and some of the performance and special case stuff… okay, okay, I think there’s likely a half way point, but let’s just say you take the whole admin api off the table. Beautiful. At least the rest is search related.
>
>
> MRM
>
> On Wed, Aug 18, 2021 at 11:51 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> The thing that concerned me most about JettySolrRunner is that it differed from what we use in production essentially leaving a gap of untested functionality, so if we move one or the other to match I'm pretty much happy. The thing it (and ServeletContextHandler) don't do is start anything else without instructions, which is the good and bad. I've not had any microservice oriented clients so startup time hasn't been a thing I thought much about but all the points above make sense in that light.
>>
>> Since SIP-6 doesn't contemplate anything like replacing jetty/servlets entirely I don't think anything I did here stands in its way, just a slightly longer list of things to instantiate (conveniently documented in web.xml) but code review would be a happy thing in case I missed something on that front.
>>
>> Even what Uwe describes seems compatible, and however we go it would be MUCH better to have the ability to inject dependencies into the things we start. Because of the unit tests we can't ever use any singleton that has state, so that's out and threadlocals are too broad, so without control to inject I was forced to set up a delicate dance with a weak map keyed on the servlet contexts which is not my favorite thing, but it worked.
>>
>> -Gus
>>
>> On Wed, Aug 18, 2021 at 10:03 AM Jan Høydahl <ja...@cominvent.com> wrote:
>>>
>>> Hi,
>>>
>>> Clearly a refactor is needed here, thanks Gus for diving in deep.
>>>
>>> Like Uwe I don't want us to be more dependent upon web.xml or Jetty's startup logic. That's why I filed SIP-6 (https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process) to try to describe the vision that Uwe just described. If such a bootstrap redesign makes the dispatch cleanup easier or harder I don't know. Perhaps one should be done before the other? Or shuold be viewed as part of the same rewrite?
>>>
>>> Jan
>>>
>>> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
>>>
>>> Hi,
>>>
>>> Thanks Mark, I fully agree with you on this: Booting up jetty with start.jar and using a web application is dramatically slower.
>>>
>>> Nevertheless, I think that we should go away with the web application alltogether. We can still stay with the servlet API, but just – as in Gus patch – to split responsibilities and go away with a singleton ServletFilter, although from my own experience, I figured out that Servletapis’s URI handling is far away from ideal. In most cases at end I always landed in a separate “master” servlet (hooked to “/*”) that only parses the PathInfo with regular expressions (one thing that does not work with servlet api) and then delegates using RequestDispatcher to several more specific servlets or returns 404 not found. This setup would also help to split the huge DispatchFilter mess, but still the URI parsing logic would be centralized. But this servlet would only take care of the dispatching, so DispatchServlet would be fine. But it should only dispatch, nothing else.
>>>
>>> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to the main distribution and remove start.jar completely. Also remove jetty-webapp.jar and related dependencies. In Boostrap just start Jetty embedded, add listen ip, port numbers, etc. We can still use the config API of Jetty for this, so maybe put this into an XML file (although XML is a bad format for this stuff, goes back to web applications). Then Bootsrap starts the Corecontainer and the embedded Zookeeper (this is another horrible thing in Solr: the embedded zookeper starts somehow as part of the web application) and hooks the servlets into a (at first) single root ServletContextHandler. This is upside down. So Solr starts with a Bootstrap class, starts zookeeper, jetty and hands over to CoreContainer.
>>>
>>> Some other ideas (not sure if they are already implemented in the patch): Maybe we should install a separate ServletContextHandler/Classloader for each Core. This would allow us to go away with our own classloader magic. Bootstrap could also do this: It starts the CoreContainer, Zookeeper, Jetty Installs a servlet for the admin UI and the Core/Collection management and then hands over to the CoreContainer to initialize all Cores. Each CoreContainer gets his own ServletContextHandler and can therefore easily be loaded unloaded from Jetty.
>>>
>>> I wrote many other apps in my daily live like this, the startup time (especially for microservices) is great. A Jetty starting up in milliseconds and going to service is unbeatable.
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>> From: Mark Miller <ma...@gmail.com>
>>> Sent: Wednesday, August 18, 2021 9:08 AM
>>> To: Gus Heck <gu...@gmail.com>
>>> Cc: dev@solr.apache.org
>>> Subject: Re: Solr and Jetty and Servlets
>>>
>>> Agreement or not, I wouldn’t step in front of whatever you did.
>>>
>>> I doubt anyone is going to take on actually pulling out of a web app because it’s easier to lean on it for configuration and I doubt start up speed is a very high priority for most.
>>>
>>> It had never really occurred to me in the past that JettySolrRunner was so much faster. I just know it because of driving the test down to ridiculously times, starting up and stopping mini clusters of whatever size in literally no time. With no QuickStart setup for embedded Jetty.
>>>
>>> So than starting up standalone Solr, it became apparent how much slower it was. So I started poking around. Cloud guys like those at google had already been feeling pain - preferring to be able to spin up and down relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s thing how they were dealing with it. And it’s simple, but it’s still easily non comparable at the end of the day. Solr ships with like 10,000 classes across a ton of jars, even with QuickStart, you have to follow the spec of what to do and what might be done by any given webapp - though as JettySolrRunner shows, we don’t actually need any of it. QuickStart is trying to find every trick to speed up following the container and webapp spec. Injecting the dispatch filter just tosses it all out :)
>>>
>>> Any, may point was not that I disagree and you should make Solr inject a dispatch filter - my point is just that it doesn’t really matter to dispatch. You can do just as much via Jetty either way.
>>>
>>> And anyone that doesn’t agree about “something should be done” is either lying or not looking. So I wouldn’t worry about agreement. The problem is that it’s quicksand that people are avoiding, not that anyone that looks can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted to work towards getting out of that hole. But at the end of the day, it’s now part of the hole. Made it deeper. Bermuda Triangle. But look at it - there was hope there. Hope that one day, dispatching would be better. Rather than twice as complicated. I don’t know that hope lives on. But no one gonna stop you on reasonable technical sense with any proper license.
>>>
>>> Mark
>>>
>>>
>>>
>>> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> *ServletContextHandler not ServletContext
>>>
>>> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> Ok nope, since that only happens with WebAppContext, not ServletContext (the parent class). whew :)
>>>
>>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and thus lack of metadata-complete=true in web.xml means that we wind up scanning for annotations?
>>>
>>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> https://issues.apache.org/jira/browse/SOLR-15590
>>>
>>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> Ok well it would be interesting to compare quickstarting vs JettySolrRunner, and reading up on quickstart gives ideas about pre-building the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either way we are still in a container and I think I hear some agreement that something should be done about the dispatch here, and both of you seem to agree that an actual servlet would make more sense than a filter, so I'll make a ticket/pr to make it easier to track & easier to read the code.
>>>
>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com> wrote:
>>>
>>> The downside to respecting web.xml and making JettySolrServer serve a webapp is that loading a webapp is very expensive and slow in comparison.  JettySolrServer actually starts up extremely quickly. It’s almost more appealing to change the Server to use the JettySolrServer strategy. It’s so slow to load a webapp because of all the things it needs to support and scan jars for - in a kind of JavaEE situation, though not as bad. Jetty QuickStart does improve the situation if used at least. But there is really no need to eat the whole webapp standard for a non webapp app to have Jetty manage more of the dispatching. I always wondering about the motivation / upside of hacking in a straight servlet myself. But for a non webapp stuck in a servlet container, it’s actually a beautiful move that side steps a bunch of slow crap even better than the QuickStart pushes ever will, and still allows for matching any support we would want.
>>>
>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I made a slim SolrCall base class that each extends. All that logic is hairy enough to follow without them interleaving and sharing in a kind of wild collaboration. It’s pretty unfortunate they both still exist.
>>>
>>> You are right, leaving that path in that state is a poor idea. If there was anything that could be considered the “hot” path, it’s right there - feverishly checking and ripping up streams for anyone and anything. Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path down? Do a dance? Treat the string like a date and see if that works.
>>>
>>> So yeah, agreed, nobody does or would dispatch this way, you have to frog boil into it. It’s slow, almost incomprehensible, and incredibly good at holding onto life, even building on it. But being a webapp and parsing web.xml has little to do with dispatching and having sensible http api that doesn’t work like it’s a perl compiler.
>>>
>>> Anyway, I can’t imagine trying to trace through that code with any solid feeling about having a handle on it via inspection. But just like Perl, if you debug / trace log the hell out of it, it is all actually pretty simple to adjust and reign in.
>>>
>>>
>>> MRM
>>>
>>>
>>>
>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> So I'm not interested in *adhering* to anything here, just using stuff where it helps... I see tools sitting on the shelf, already bought and paid for and carried with us to every job but then they sit in the back of the truck mostly unused... and (I think) they look useful.
>>>
>>> This should not be seen as in any way advocating a return to war file deployment or "choose your container". All of what I would suggest should likely be compatible with a jetty startup controlled by our code (I assume we can convince it to read web.xml properly when doing so). Certainly point out anything that would interfere with that.
>>>
>>> Here we sit in a container that is a tool box containing:
>>>
>>> A nice mechanism for initializing stuff at the application level, (ServletContextListener) and a well defined guarantee that it runs before any filter or servlet,
>>> An automatic shutdown mechanism that it will call for us on graceful shutdown (ServletContextListner again).
>>> A nice layered filter chain mechanism which could have been used to layer in things like tracing and authentication, and close shields etc as small succinct filters rather than weaving them into an ever more complex filter & call class.
>>> In more recent versions of the spec, for listeners defined in web.xml the order is also guaranteed.
>>> Servlet classes that are *already* set up to distinguish http verbs automatically when desired
>>>
>>> So why is this better? Because monster classes that do a hundred things are really hard to understand and maintain. Small methods, small classes whenever possible. I also suspect that there may be some gains in performance to be had if we rely more on the container (which will already be dispatching based on path) to choose our code paths (at least at a coarse level) and then have less custom dispatch logic executed on *every* request
>>>
>>> Obviously I'm wrong and if the net result is less performant to any significant degree forget it that wouldn't be worth it. (wanted: standardized solr benchmarks)
>>>
>>> There WILL be complications with v2 because it is a subclass of HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it should be a separate servlet from v1, but we don't want to duplicate code either... so work to do there...
>>>
>>> I think an incremental approach is necessary since very few of us have the bandwidth to more than that and certainly it becomes difficult to find anyone with the time to review large changes. What I have thus far is stable with respect to the tests, and simplifies some stuff already which is why I chose this point to start the discussion
>>>
>>> But yeah, look at what I did and say what you like and what you don't. :).
>>>
>>> -Gus
>>>
>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>>>
>>> Sounds like an interesting adventure last weekend.
>>>
>>> I'm unclear what the point of going this direction is; my instinct is to go the opposite direction.  You seem to suggest there are some simplification/organization benefits, which I love, so I'll need to look at what you've done to judge that for myself.  Yes Jetty supports the Servlet spec but we need not embrace it.  Adhering to that is useful if you have a generic web app that can be deployed to a container of the user's convenience/choosing.  No doubt this is why Solr started this way, and why the apps I built in my early days adhered to that spec.  But since 6.0, we view Solr as self-contained and more supportable if the project makes these decisions and thus not needlessly constrain itself as well.
>>>
>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and not a *Servlet* itself.
>>>
>>> Also, I suspect there may be complications in changes here relating to Solr's v1 vs v2 API.  And most definitely also what you discovered -- JettySolrRunner.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>> TLDR: I've got a working branch where CoreContainer & our startup process is extracted from SolrDispatch Filter. Do other folks think that is interesting enough that I should make a JIRA and/or PR with intention to make this change?
>>>
>>> Details:
>>>
>>> Jetty is a servlet container, yet we more or less ignore it by stuffing everything into a single filter (almost, admin UI is served separately). I'm sure there are lots of historical reasons for this, probably including servlet containers and their specs were much less mature when solr was first started. Maybe also the early authors were more focused on making search work than leveraging what the container could do for them (but I wasn't there for that so that's just a guess).
>>>
>>> The result is that we have a couple of very large classes that are touched by almost every request, and they have a lot of conditional logic trying to decide what the user is asking. Normally this sort of dispatch is done by the container based on the request URL to direct it to the appropriate servlet. Solr does a LOT of different things so this code is extremely tricky and complex to understand. Specifically, I'm speaking of SolrDispatchFilter and HttpSolrCall, which are so inseparable that being two classes probably makes them harder to understand.
>>>
>>> It seems to me (and this mail is asking if you agree) that these classes are long overdue for some subdivision. The most obvious thing to pull out is all the admin calls. Admin code paths really have little or nothing to do with query or update code paths since there are no documents to route or sub-requests to some subset of nodes.
>>>
>>> The primary obstacle to any such separation and simplification is that most requests have some interaction with CoreContainer, and the things it holds, and this is initialized and held by a field in SolrDispatchFilter. After spending a significant chunk of time reading this code in the prior weeks and a timely and motivating conversation with Eric Pugh, I dumped a chunk of my weekend into an experiment to see if I could pull CoreContainer out of the dispatch filter, and leverage the facilities of our servlet container.
>>>
>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was very confusing, and worrisome since I've realized it's not respecting our web.xml at all, and any configuration in web.xml needs to be duplicated for our tests in JettySolrRunner (tangent alert)
>>>
>>> The result is that CoreContainer is now held by a class called CoreService (please help me name things if you don't like my names :) ). CoreService is a ServletContextListener, appropriately configured in web.xml, and has a static method that can be used to get a reference to the CoreContainer corresponding to the ServletContext in which code wanting a core container is running (this supports having multiple JettySolrRunners in tests, though probably never has more than one CoreContainer in the running application)
>>>
>>> I achieved this in 4 stages shown here: https://github.com/gus-asf/solr/tree/servlet-solr
>>>
>>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted without harm.
>>>
>>> Since the current state of the code in that branch is apparently test-stable (4 runs of check in a row passing, none slower than any run of 3 runs of main, both as low as 10.5 min if I don't continue working on the machine)...
>>>
>>> Do we want to push this refactor in now to avoid making a huge ball of changes that gets harder and harder to merge? The next push point would probably be when AdminServlet was functional (if/when that happens) (and we could not push that class for now).
>>>
>>> If you read this far, thanks :) I wasn't sure how feasible this would be so I felt the need to prove it to my self in code before wasting your time, but please don't hesitate to point to costs I might be missing or anything that looks like a mistake, or say why this was a total waste of time :)
>>>
>>> Thoughts?
>>>
>>> -Gus
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>
> --
> - Mark
>
> http://about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org


Re: Solr and Jetty and Servlets

Posted by Mark Miller <ma...@gmail.com>.
The gap is not ideal and I’m not advocating for it, but it does help keep
you honest. There is gap between a mini cluster and a packed dist
regardless and it’s really something that should be covered by testing even
with more of a lineup.

I’m very into dependency injection in Solr. I was adding juice at the end
of branch 2.5. The benefit to testing is well known, but wasn’t even
looking at that.

Our class dependencies are a major hindrance. From everywhere, you need
access to the CoreContainer. And ZkController. And then make chained calls
all over the place to reach back and forth into each other.

Need a zksolrclient? Start chaining method calls until you find it.
ZkstateReader? Pretty much anything. You find a friendly root and start
reaching back and forth. It’s a very verbose and complicated framework. And
with issues depending on the class and implementation.  Because threads are
started in constructors and everything is so interdependent, you actually
race spinning the wild up.

Getting the construction, and lifecycle and access to all the core widely
used classes is delicate work. And is essentially given to everyone,
everywhere.

So I built everything at the top, in a guice configuration. Lifecycle and
construction and dependencies mostly collocated in the code in one spot.
Classes that need access get those objects injected. No looking for,
chaining for, constructing a second or third instance for …

And then what I was starting on was, because we have all these complex
objects and difficult to understand lifecycle and access, and in many many
places - mistakes are made (Someone closes something they shouldn’t. Makes
a call they are not supposed to) And so I started working on having most of
the key, long lived objects managed by a small core layer of code. And the
objects that got injected where getting specialized - only the core
lifecycle code can close a zksolrclient. The method doesn’t exist or throws
and exception for the request code. Requests can access the resources -
easily discoverable, and without the ability to mess up with the
unnecessary power they are given.

So just like api V2, you can find a hint of this in the dual api for zk
access. And like V2, half baked, undocumented. Team tackled by like one,
for a while, to some limited extent. And so again you have an independent
idea built on a gold nugget that ends up increasing complexity and
variation and surface area and ability to understand. The core of the code
always large enough to barrel along in the same direction while some
independent, time limited efforts, start pulling the mass wider in
objectively reasonable goals. But it’s just a wider mass barreling forward.

Essentially to say, yes. Dependency injection.  You can shrink the world.
Give the average developer half a chance.

Request dispatching and the rest api should be the same deal. Out of the
way. Done with a focus to shrink the world and remove the power and let
search developers focus on search. I look at the V2 api and think, man,
it’s like someone saw gold and then went after a nickel. We shouldn’t be in
the rest api and dispatching business.

Let’s say you implement an OpenApi or SwaggerApi. You get tools. Web, ide,
cmd line.   You get code generation. Server and client. Editors. You get
automated documentation creation. You get a standard and compatibility
that’s handled by a huge community dedicated to it. When devs have to work
on it, it’s simple, it’s documented and tutorialed and talked about -
outside of the project and the one or two hobbiests that have taken a
passing interest in a rest apia and dispatching.

And V2 seems to look at that and take a bite or two. But leave almost
eveything on the table. A lot of code. Very non standard custom,
undocumented code. Non of the tool support. Pretty heavy work to even
unwind it with few  pointers to help.  Still no standard. Self documenting
at a very low level. And with very little of the benefit. When you follow
these specs, not only is most of the work down for you beyond
specification, but your ide auto completes your urls. Tools auto mock
testing. A whole bunch of crap pushed away.

The stuff that Solr is actually good at is the core of the actual search
functionality. But the amount of out of scope and poor code surrounding
that is what drags in the whole system. And most of it can be kicked right
out of the project and into code and specs solved and maintained by people
that actually do that work for a living.

Here and there we have some special needs. No doubt. But a little more
effort on those json files that make up V2? And at the push of a button you
have the basic working code for a full client. For a jetty server
implementation. And very pretty, full, exampled, generated documentation.
Just a click and a small amount of effort. Even if you say, while the
search api, and some of the performance and special case stuff… okay, okay,
I think there’s likely a half way point, but let’s just say you take the
whole admin api off the table. Beautiful. At least the rest is search
related.


MRM

On Wed, Aug 18, 2021 at 11:51 AM Gus Heck <gu...@gmail.com> wrote:

> The thing that concerned me most about JettySolrRunner is that it differed
> from what we use in production essentially leaving a gap of untested
> functionality, so if we move one or the other to match I'm pretty much
> happy. The thing it (and ServeletContextHandler) don't do is start anything
> else without instructions, which is the good and bad. I've not had any
> microservice oriented clients so startup time hasn't been a thing I thought
> much about but all the points above make sense in that light.
>
> Since SIP-6 doesn't contemplate anything like replacing jetty/servlets
> entirely I don't think anything I did here stands in its way, just a
> slightly longer list of things to instantiate (conveniently documented in
> web.xml) but code review would be a happy thing in case I missed something
> on that front.
>
> Even what Uwe describes seems compatible, and however we go it would be
> MUCH better to have the ability to inject dependencies into the things we
> start. Because of the unit tests we can't ever use any singleton that has
> state, so that's out and threadlocals are too broad, so without control to
> inject I was forced to set up a delicate dance with a weak map keyed on the
> servlet contexts which is not my favorite thing, but it worked.
>
> -Gus
>
> On Wed, Aug 18, 2021 at 10:03 AM Jan Høydahl <ja...@cominvent.com>
> wrote:
>
>> Hi,
>>
>> Clearly a refactor is needed here, thanks Gus for diving in deep.
>>
>> Like Uwe I don't want us to be more dependent upon web.xml or Jetty's
>> startup logic. That's why I filed SIP-6 (
>> https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process)
>> to try to describe the vision that Uwe just described. If such a bootstrap
>> redesign makes the dispatch cleanup easier or harder I don't know. Perhaps
>> one should be done before the other? Or shuold be viewed as part of the
>> same rewrite?
>>
>> Jan
>>
>> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
>>
>> Hi,
>>
>> Thanks Mark, I fully agree with you on this: Booting up jetty with
>> start.jar and using a web application is dramatically slower.
>>
>> Nevertheless, I think that we should go away with the web application
>> alltogether. We can still stay with the servlet API, but just – as in Gus
>> patch – to split responsibilities and go away with a singleton
>> ServletFilter, although from my own experience, I figured out that
>> Servletapis’s URI handling is far away from ideal. In most cases at end I
>> always landed in a separate “master” servlet (hooked to “/*”) that only
>> parses the PathInfo with regular expressions (one thing that does not work
>> with servlet api) and then delegates using RequestDispatcher to several
>> more specific servlets or returns 404 not found. This setup would also help
>> to split the huge DispatchFilter mess, but still the URI parsing logic
>> would be centralized. But this servlet would only take care of the
>> dispatching, so DispatchServlet would be fine. But it should only dispatch,
>> nothing else.
>>
>> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to the
>> main distribution and remove start.jar completely. Also remove
>> jetty-webapp.jar and related dependencies. In Boostrap just start Jetty
>> embedded, add listen ip, port numbers, etc. We can still use the config API
>> of Jetty for this, so maybe put this into an XML file (although XML is a
>> bad format for this stuff, goes back to web applications). Then Bootsrap
>> starts the Corecontainer and the embedded Zookeeper (this is another
>> horrible thing in Solr: the embedded zookeper starts somehow as part of the
>> web application) and hooks the servlets into a (at first) single root
>> ServletContextHandler. This is upside down. So Solr starts with a Bootstrap
>> class, starts zookeeper, jetty and hands over to CoreContainer.
>>
>> Some other ideas (not sure if they are already implemented in the patch):
>> Maybe we should install a separate ServletContextHandler/Classloader for
>> each Core. This would allow us to go away with our own classloader magic.
>> Bootstrap could also do this: It starts the CoreContainer, Zookeeper, Jetty
>> Installs a servlet for the admin UI and the Core/Collection management and
>> then hands over to the CoreContainer to initialize all Cores. Each
>> CoreContainer gets his own ServletContextHandler and can therefore easily
>> be loaded unloaded from Jetty.
>>
>> I wrote many other apps in my daily live like this, the startup time
>> (especially for microservices) is great. A Jetty starting up in
>> milliseconds and going to service is unbeatable.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> <https://www.google.com/maps/search/Achterdiek+19,+D-28357+Bremen?entry=gmail&source=g>
>> https://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> *From:* Mark Miller <ma...@gmail.com>
>> *Sent:* Wednesday, August 18, 2021 9:08 AM
>> *To:* Gus Heck <gu...@gmail.com>
>> *Cc:* dev@solr.apache.org
>> *Subject:* Re: Solr and Jetty and Servlets
>>
>> Agreement or not, I wouldn’t step in front of whatever you did.
>>
>> I doubt anyone is going to take on actually pulling out of a web app
>> because it’s easier to lean on it for configuration and I doubt start up
>> speed is a very high priority for most.
>>
>> It had never really occurred to me in the past that JettySolrRunner was
>> so much faster. I just know it because of driving the test down to
>> ridiculously times, starting up and stopping mini clusters of whatever size
>> in literally no time. With no QuickStart setup for embedded Jetty.
>>
>> So than starting up standalone Solr, it became apparent how much slower
>> it was. So I started poking around. Cloud guys like those at google had
>> already been feeling pain - preferring to be able to spin up and down
>> relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s
>> thing how they were dealing with it. And it’s simple, but it’s still easily
>> non comparable at the end of the day. Solr ships with like 10,000 classes
>> across a ton of jars, even with QuickStart, you have to follow the spec of
>> what to do and what might be done by any given webapp - though as
>> JettySolrRunner shows, we don’t actually need any of it. QuickStart is
>> trying to find every trick to speed up following the container and webapp
>> spec. Injecting the dispatch filter just tosses it all out :)
>>
>> Any, may point was not that I disagree and you should make Solr inject a
>> dispatch filter - my point is just that it doesn’t really matter to
>> dispatch. You can do just as much via Jetty either way.
>>
>> And anyone that doesn’t agree about “something should be done” is either
>> lying or not looking. So I wouldn’t worry about agreement. The problem is
>> that it’s quicksand that people are avoiding, not that anyone that looks
>> can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted
>> to work towards getting out of that hole. But at the end of the day, it’s
>> now part of the hole. Made it deeper. Bermuda Triangle. But look at it -
>> there was hope there. Hope that one day, dispatching would be better.
>> Rather than twice as complicated. I don’t know that hope lives on. But no
>> one gonna stop you on reasonable technical sense with any proper license.
>>
>> Mark
>>
>>
>>
>> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> *ServletContextHandler not ServletContext
>>
>> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> Ok nope, since that only happens with WebAppContext, not ServletContext
>> (the parent class). whew :)
>>
>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
>> thus lack of metadata-complete=true in web.xml means that we wind up
>> scanning for annotations?
>>
>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> https://issues.apache.org/jira/browse/SOLR-15590
>>
>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> Ok well it would be interesting to compare quickstarting vs
>> JettySolrRunner, and reading up on quickstart
>> <https://webtide.com/jetty-9-quick-start/> gives ideas about
>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
>> tangent. Either way we are still in a container and I think I hear some
>> agreement that something should be done about the dispatch here, and both
>> of you seem to agree that an actual servlet would make more sense than a
>> filter, so I'll make a ticket/pr to make it easier to track & easier to
>> read the code.
>>
>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
>> wrote:
>>
>> The downside to respecting web.xml and making JettySolrServer serve a
>> webapp is that loading a webapp is very expensive and slow in comparison.
>> JettySolrServer actually starts up extremely quickly. It’s almost more
>> appealing to change the Server to use the JettySolrServer strategy. It’s so
>> slow to load a webapp because of all the things it needs to support and
>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>> QuickStart does improve the situation if used at least. But there is really
>> no need to eat the whole webapp standard for a non webapp app to have Jetty
>> manage more of the dispatching. I always wondering about the motivation /
>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>> in a servlet container, it’s actually a beautiful move that side steps a
>> bunch of slow crap even better than the QuickStart pushes ever will, and
>> still allows for matching any support we would want.
>>
>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
>> made a slim SolrCall base class that each extends. All that logic is hairy
>> enough to follow without them interleaving and sharing in a kind of wild
>> collaboration. It’s pretty unfortunate they both still exist.
>>
>> You are right, leaving that path in that state is a poor idea. If there
>> was anything that could be considered the “hot” path, it’s right there -
>> feverishly checking and ripping up streams for anyone and anything. Is a
>> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
>> down? Do a dance? Treat the string like a date and see if that works.
>>
>> So yeah, agreed, nobody does or would dispatch this way, you have to frog
>> boil into it. It’s slow, almost incomprehensible, and incredibly good at
>> holding onto life, even building on it. But being a webapp and parsing
>> web.xml has little to do with dispatching and having sensible http api that
>> doesn’t work like it’s a perl compiler.
>>
>> Anyway, I can’t imagine trying to trace through that code with any solid
>> feeling about having a handle on it via inspection. But just like Perl, if
>> you debug / trace log the hell out of it, it is all actually pretty simple
>> to adjust and reign in.
>>
>>
>> MRM
>>
>>
>>
>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>
>> So I'm not interested in *adhering* to anything here, just using stuff
>> where it helps... I see tools sitting on the shelf, already bought and paid
>> for and carried with us to every job but then they sit in the back of the
>> truck mostly unused... and (I think) they look useful.
>>
>> This should not be seen as in any way advocating a return to war file
>> deployment or "choose your container". All of what I would suggest should
>> likely be compatible with a jetty startup controlled by our code (I assume
>> we can convince it to read web.xml properly when doing so). Certainly point
>> out anything that would interfere with that.
>>
>> Here we sit in a container that is a tool box containing:
>>
>>    - A nice mechanism for initializing stuff at the application level,
>>    (ServletContextListener) and a well defined guarantee that it runs before
>>    any filter or servlet,
>>    - An automatic shutdown mechanism that it will call for us on
>>    graceful shutdown (ServletContextListner again).
>>    - A nice layered filter chain mechanism which could have been used to
>>    layer in things like tracing and authentication, and close shields etc as
>>    small succinct filters rather than weaving them into an ever more complex
>>    filter & call class.
>>    - In more recent versions of the spec, for listeners defined in
>>    web.xml the order is also guaranteed.
>>    - Servlet classes that are *already* set up to distinguish http verbs
>>    automatically when desired
>>
>> So why is this better? Because monster classes that do a hundred things
>> are really hard to understand and maintain. Small methods, small classes
>> whenever possible. I also suspect that there may be some gains in
>> performance to be had if we rely more on the container (which will already
>> be dispatching based on path) to choose our code paths (at least at a
>> coarse level) and then have less custom dispatch logic executed on *every*
>> request
>>
>> Obviously I'm wrong and if the net result is less performant to any
>> significant degree forget it that wouldn't be worth it. (wanted:
>> standardized solr benchmarks)
>>
>> There WILL be complications with v2 because it is a subclass of
>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>> should be a separate servlet from v1, but we don't want to duplicate code
>> either... so work to do there...
>>
>> I think an incremental approach is necessary since very few of us have
>> the bandwidth to more than that and certainly it becomes difficult to find
>> anyone with the time to review large changes. What I have thus far is
>> stable with respect to the tests, and simplifies some stuff already which
>> is why I chose this point to start the discussion
>>
>> But yeah, look at what I did and say what you like and what you don't.
>> :).
>>
>> -Gus
>>
>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>>
>> Sounds like an interesting adventure last weekend.
>>
>> I'm unclear what the point of going this direction is; my instinct is to
>> go the opposite direction.  You seem to suggest there are some
>> simplification/organization benefits, which I love, so I'll need to look at
>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>> spec but we need not embrace it.  Adhering to that is useful if you have a
>> generic web app that can be deployed to a container of the user's
>> convenience/choosing.  No doubt this is why Solr started this way, and why
>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>> view Solr as self-contained and more supportable if the project makes these
>> decisions and thus not needlessly constrain itself as well.
>>
>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and
>> not a *Servlet* itself.
>>
>> Also, I suspect there may be complications in changes here relating to
>> Solr's v1 vs v2 API.  And most definitely also what you discovered --
>> JettySolrRunner.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> *TLDR:* I've got a working branch
>> <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer
>> & our startup process is extracted from SolrDispatch Filter. Do other folks
>> think that is interesting enough that I should make a JIRA and/or PR with
>> intention to make this change?
>>
>> *Details:*
>>
>> Jetty is a servlet container, yet we more or less ignore it by stuffing
>> everything into a single filter (almost, admin UI is served separately).
>> I'm sure there are lots of historical reasons for this, probably including
>> servlet containers and their specs were much less mature when solr was
>> first started. Maybe also the early authors were more focused on making
>> search work than leveraging what the container could do for them (but I
>> wasn't there for that so that's just a guess).
>>
>> The result is that we have a couple of very large classes that are
>> touched by almost every request, and they have a lot of conditional logic
>> trying to decide what the user is asking. Normally this sort of dispatch is
>> done by the container based on the request URL to direct it to the
>> appropriate servlet. Solr does a LOT of different things so this code is
>> extremely tricky and complex to understand. Specifically, I'm speaking of
>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>> two classes probably makes them harder to understand.
>>
>> It seems to me (and this mail is asking if you agree) that these classes
>> are long overdue for some subdivision. The most obvious thing to pull out
>> is all the admin calls. Admin code paths really have little or nothing to
>> do with query or update code paths since there are no documents to route or
>> sub-requests to some subset of nodes.
>>
>> The primary obstacle to any such separation and simplification is that
>> most requests have some interaction with CoreContainer, and the things it
>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>> After spending a significant chunk of time reading this code in the prior
>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>> out of the dispatch filter, and leverage the facilities of our servlet
>> container.
>>
>> That wasn't too terribly hard, but keeping JettySolrRunner happy was very
>> confusing, and worrisome since I've realized it's not respecting our
>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>> our tests in JettySolrRunner (tangent alert)
>>
>> The result is that CoreContainer is now held by a class called
>> CoreService (please help me name things if you don't like my names :) ).
>> CoreService is a ServletContextListener, appropriately configured in
>> web.xml, and has a static method that can be used to get a reference to the
>> CoreContainer corresponding to the ServletContext in which code wanting a
>> core container is running (this supports having multiple JettySolrRunners
>> in tests, though probably never has more than one CoreContainer in the
>> running application)
>>
>> I achieved this in 4 stages shown here:
>> https://github.com/gus-asf/solr/tree/servlet-solr
>>
>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted
>> without harm.
>>
>> Since the current state of the code in that branch is apparently
>> test-stable (4 runs of check in a row passing, none slower than any run of
>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>> machine)...
>>
>> Do we want to push this refactor in now to avoid making a huge ball of
>> changes that gets harder and harder to merge? The next push point would
>> probably be when AdminServlet was functional (if/when that happens) (and we
>> could not push that class for now).
>>
>> If you read this far, thanks :) I wasn't sure how feasible this would be
>> so I felt the need to prove it to my self in code before wasting your time,
>> but please don't hesitate to point to costs I might be missing or anything
>> that looks like a mistake, or say why this was a total waste of time :)
>>
>> Thoughts?
>>
>> -Gus
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
-- 
- Mark

http://about.me/markrmiller

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
The thing that concerned me most about JettySolrRunner is that it differed
from what we use in production essentially leaving a gap of untested
functionality, so if we move one or the other to match I'm pretty much
happy. The thing it (and ServeletContextHandler) don't do is start anything
else without instructions, which is the good and bad. I've not had any
microservice oriented clients so startup time hasn't been a thing I thought
much about but all the points above make sense in that light.

Since SIP-6 doesn't contemplate anything like replacing jetty/servlets
entirely I don't think anything I did here stands in its way, just a
slightly longer list of things to instantiate (conveniently documented in
web.xml) but code review would be a happy thing in case I missed something
on that front.

Even what Uwe describes seems compatible, and however we go it would be
MUCH better to have the ability to inject dependencies into the things we
start. Because of the unit tests we can't ever use any singleton that has
state, so that's out and threadlocals are too broad, so without control to
inject I was forced to set up a delicate dance with a weak map keyed on the
servlet contexts which is not my favorite thing, but it worked.

-Gus

On Wed, Aug 18, 2021 at 10:03 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Hi,
>
> Clearly a refactor is needed here, thanks Gus for diving in deep.
>
> Like Uwe I don't want us to be more dependent upon web.xml or Jetty's
> startup logic. That's why I filed SIP-6 (
> https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process)
> to try to describe the vision that Uwe just described. If such a bootstrap
> redesign makes the dispatch cleanup easier or harder I don't know. Perhaps
> one should be done before the other? Or shuold be viewed as part of the
> same rewrite?
>
> Jan
>
> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
>
> Hi,
>
> Thanks Mark, I fully agree with you on this: Booting up jetty with
> start.jar and using a web application is dramatically slower.
>
> Nevertheless, I think that we should go away with the web application
> alltogether. We can still stay with the servlet API, but just – as in Gus
> patch – to split responsibilities and go away with a singleton
> ServletFilter, although from my own experience, I figured out that
> Servletapis’s URI handling is far away from ideal. In most cases at end I
> always landed in a separate “master” servlet (hooked to “/*”) that only
> parses the PathInfo with regular expressions (one thing that does not work
> with servlet api) and then delegates using RequestDispatcher to several
> more specific servlets or returns 404 not found. This setup would also help
> to split the huge DispatchFilter mess, but still the URI parsing logic
> would be centralized. But this servlet would only take care of the
> dispatching, so DispatchServlet would be fine. But it should only dispatch,
> nothing else.
>
> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to the
> main distribution and remove start.jar completely. Also remove
> jetty-webapp.jar and related dependencies. In Boostrap just start Jetty
> embedded, add listen ip, port numbers, etc. We can still use the config API
> of Jetty for this, so maybe put this into an XML file (although XML is a
> bad format for this stuff, goes back to web applications). Then Bootsrap
> starts the Corecontainer and the embedded Zookeeper (this is another
> horrible thing in Solr: the embedded zookeper starts somehow as part of the
> web application) and hooks the servlets into a (at first) single root
> ServletContextHandler. This is upside down. So Solr starts with a Bootstrap
> class, starts zookeeper, jetty and hands over to CoreContainer.
>
> Some other ideas (not sure if they are already implemented in the patch):
> Maybe we should install a separate ServletContextHandler/Classloader for
> each Core. This would allow us to go away with our own classloader magic.
> Bootstrap could also do this: It starts the CoreContainer, Zookeeper, Jetty
> Installs a servlet for the admin UI and the Core/Collection management and
> then hands over to the CoreContainer to initialize all Cores. Each
> CoreContainer gets his own ServletContextHandler and can therefore easily
> be loaded unloaded from Jetty.
>
> I wrote many other apps in my daily live like this, the startup time
> (especially for microservices) is great. A Jetty starting up in
> milliseconds and going to service is unbeatable.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> *From:* Mark Miller <ma...@gmail.com>
> *Sent:* Wednesday, August 18, 2021 9:08 AM
> *To:* Gus Heck <gu...@gmail.com>
> *Cc:* dev@solr.apache.org
> *Subject:* Re: Solr and Jetty and Servlets
>
> Agreement or not, I wouldn’t step in front of whatever you did.
>
> I doubt anyone is going to take on actually pulling out of a web app
> because it’s easier to lean on it for configuration and I doubt start up
> speed is a very high priority for most.
>
> It had never really occurred to me in the past that JettySolrRunner was so
> much faster. I just know it because of driving the test down to
> ridiculously times, starting up and stopping mini clusters of whatever size
> in literally no time. With no QuickStart setup for embedded Jetty.
>
> So than starting up standalone Solr, it became apparent how much slower it
> was. So I started poking around. Cloud guys like those at google had
> already been feeling pain - preferring to be able to spin up and down
> relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s
> thing how they were dealing with it. And it’s simple, but it’s still easily
> non comparable at the end of the day. Solr ships with like 10,000 classes
> across a ton of jars, even with QuickStart, you have to follow the spec of
> what to do and what might be done by any given webapp - though as
> JettySolrRunner shows, we don’t actually need any of it. QuickStart is
> trying to find every trick to speed up following the container and webapp
> spec. Injecting the dispatch filter just tosses it all out :)
>
> Any, may point was not that I disagree and you should make Solr inject a
> dispatch filter - my point is just that it doesn’t really matter to
> dispatch. You can do just as much via Jetty either way.
>
> And anyone that doesn’t agree about “something should be done” is either
> lying or not looking. So I wouldn’t worry about agreement. The problem is
> that it’s quicksand that people are avoiding, not that anyone that looks
> can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted
> to work towards getting out of that hole. But at the end of the day, it’s
> now part of the hole. Made it deeper. Bermuda Triangle. But look at it -
> there was hope there. Hope that one day, dispatching would be better.
> Rather than twice as complicated. I don’t know that hope lives on. But no
> one gonna stop you on reasonable technical sense with any proper license.
>
> Mark
>
>
>
> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com> wrote:
>
> *ServletContextHandler not ServletContext
>
> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:
>
> Ok nope, since that only happens with WebAppContext, not ServletContext
> (the parent class). whew :)
>
> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
>
> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
> thus lack of metadata-complete=true in web.xml means that we wind up
> scanning for annotations?
>
> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>
> https://issues.apache.org/jira/browse/SOLR-15590
>
> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>
> Ok well it would be interesting to compare quickstarting vs
> JettySolrRunner, and reading up on quickstart
> <https://webtide.com/jetty-9-quick-start/> gives ideas about pre-building
> the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either
> way we are still in a container and I think I hear some agreement that
> something should be done about the dispatch here, and both of you seem to
> agree that an actual servlet would make more sense than a filter, so I'll
> make a ticket/pr to make it easier to track & easier to read the code.
>
> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com> wrote:
>
> The downside to respecting web.xml and making JettySolrServer serve a
> webapp is that loading a webapp is very expensive and slow in comparison.
> JettySolrServer actually starts up extremely quickly. It’s almost more
> appealing to change the Server to use the JettySolrServer strategy. It’s so
> slow to load a webapp because of all the things it needs to support and
> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
> QuickStart does improve the situation if used at least. But there is really
> no need to eat the whole webapp standard for a non webapp app to have Jetty
> manage more of the dispatching. I always wondering about the motivation /
> upside of hacking in a straight servlet myself. But for a non webapp stuck
> in a servlet container, it’s actually a beautiful move that side steps a
> bunch of slow crap even better than the QuickStart pushes ever will, and
> still allows for matching any support we would want.
>
> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
> made a slim SolrCall base class that each extends. All that logic is hairy
> enough to follow without them interleaving and sharing in a kind of wild
> collaboration. It’s pretty unfortunate they both still exist.
>
> You are right, leaving that path in that state is a poor idea. If there
> was anything that could be considered the “hot” path, it’s right there -
> feverishly checking and ripping up streams for anyone and anything. Is a
> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
> down? Do a dance? Treat the string like a date and see if that works.
>
> So yeah, agreed, nobody does or would dispatch this way, you have to frog
> boil into it. It’s slow, almost incomprehensible, and incredibly good at
> holding onto life, even building on it. But being a webapp and parsing
> web.xml has little to do with dispatching and having sensible http api that
> doesn’t work like it’s a perl compiler.
>
> Anyway, I can’t imagine trying to trace through that code with any solid
> feeling about having a handle on it via inspection. But just like Perl, if
> you debug / trace log the hell out of it, it is all actually pretty simple
> to adjust and reign in.
>
>
> MRM
>
>
>
> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>
> So I'm not interested in *adhering* to anything here, just using stuff
> where it helps... I see tools sitting on the shelf, already bought and paid
> for and carried with us to every job but then they sit in the back of the
> truck mostly unused... and (I think) they look useful.
>
> This should not be seen as in any way advocating a return to war file
> deployment or "choose your container". All of what I would suggest should
> likely be compatible with a jetty startup controlled by our code (I assume
> we can convince it to read web.xml properly when doing so). Certainly point
> out anything that would interfere with that.
>
> Here we sit in a container that is a tool box containing:
>
>    - A nice mechanism for initializing stuff at the application level,
>    (ServletContextListener) and a well defined guarantee that it runs before
>    any filter or servlet,
>    - An automatic shutdown mechanism that it will call for us on graceful
>    shutdown (ServletContextListner again).
>    - A nice layered filter chain mechanism which could have been used to
>    layer in things like tracing and authentication, and close shields etc as
>    small succinct filters rather than weaving them into an ever more complex
>    filter & call class.
>    - In more recent versions of the spec, for listeners defined in
>    web.xml the order is also guaranteed.
>    - Servlet classes that are *already* set up to distinguish http verbs
>    automatically when desired
>
> So why is this better? Because monster classes that do a hundred things
> are really hard to understand and maintain. Small methods, small classes
> whenever possible. I also suspect that there may be some gains in
> performance to be had if we rely more on the container (which will already
> be dispatching based on path) to choose our code paths (at least at a
> coarse level) and then have less custom dispatch logic executed on *every*
> request
>
> Obviously I'm wrong and if the net result is less performant to any
> significant degree forget it that wouldn't be worth it. (wanted:
> standardized solr benchmarks)
>
> There WILL be complications with v2 because it is a subclass of
> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
> should be a separate servlet from v1, but we don't want to duplicate code
> either... so work to do there...
>
> I think an incremental approach is necessary since very few of us have the
> bandwidth to more than that and certainly it becomes difficult to find
> anyone with the time to review large changes. What I have thus far is
> stable with respect to the tests, and simplifies some stuff already which
> is why I chose this point to start the discussion
>
> But yeah, look at what I did and say what you like and what you don't. :).
>
> -Gus
>
> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>
> Sounds like an interesting adventure last weekend.
>
> I'm unclear what the point of going this direction is; my instinct is to
> go the opposite direction.  You seem to suggest there are some
> simplification/organization benefits, which I love, so I'll need to look at
> what you've done to judge that for myself.  Yes Jetty supports the Servlet
> spec but we need not embrace it.  Adhering to that is useful if you have a
> generic web app that can be deployed to a container of the user's
> convenience/choosing.  No doubt this is why Solr started this way, and why
> the apps I built in my early days adhered to that spec.  But since 6.0, we
> view Solr as self-contained and more supportable if the project makes these
> decisions and thus not needlessly constrain itself as well.
>
> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and
> not a *Servlet* itself.
>
> Also, I suspect there may be complications in changes here relating to
> Solr's v1 vs v2 API.  And most definitely also what you discovered --
> JettySolrRunner.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>
> *TLDR:* I've got a working branch
> <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer &
> our startup process is extracted from SolrDispatch Filter. Do other folks
> think that is interesting enough that I should make a JIRA and/or PR with
> intention to make this change?
>
> *Details:*
>
> Jetty is a servlet container, yet we more or less ignore it by stuffing
> everything into a single filter (almost, admin UI is served separately).
> I'm sure there are lots of historical reasons for this, probably including
> servlet containers and their specs were much less mature when solr was
> first started. Maybe also the early authors were more focused on making
> search work than leveraging what the container could do for them (but I
> wasn't there for that so that's just a guess).
>
> The result is that we have a couple of very large classes that are touched
> by almost every request, and they have a lot of conditional logic trying to
> decide what the user is asking. Normally this sort of dispatch is done by
> the container based on the request URL to direct it to the appropriate
> servlet. Solr does a LOT of different things so this code is extremely
> tricky and complex to understand. Specifically, I'm speaking of
> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
> two classes probably makes them harder to understand.
>
> It seems to me (and this mail is asking if you agree) that these classes
> are long overdue for some subdivision. The most obvious thing to pull out
> is all the admin calls. Admin code paths really have little or nothing to
> do with query or update code paths since there are no documents to route or
> sub-requests to some subset of nodes.
>
> The primary obstacle to any such separation and simplification is that
> most requests have some interaction with CoreContainer, and the things it
> holds, and this is initialized and held by a field in SolrDispatchFilter.
> After spending a significant chunk of time reading this code in the prior
> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
> chunk of my weekend into an experiment to see if I could pull CoreContainer
> out of the dispatch filter, and leverage the facilities of our servlet
> container.
>
> That wasn't too terribly hard, but keeping JettySolrRunner happy was very
> confusing, and worrisome since I've realized it's not respecting our
> web.xml at all, and any configuration in web.xml needs to be duplicated for
> our tests in JettySolrRunner (tangent alert)
>
> The result is that CoreContainer is now held by a class called CoreService
> (please help me name things if you don't like my names :) ). CoreService is
> a ServletContextListener, appropriately configured in web.xml, and has a
> static method that can be used to get a reference to the CoreContainer
> corresponding to the ServletContext in which code wanting a core container
> is running (this supports having multiple JettySolrRunners in tests, though
> probably never has more than one CoreContainer in the running application)
>
> I achieved this in 4 stages shown here:
> https://github.com/gus-asf/solr/tree/servlet-solr
>
> Ignore the AdminServlet class, it's a placeholder, and can be subtracted
> without harm.
>
> Since the current state of the code in that branch is apparently
> test-stable (4 runs of check in a row passing, none slower than any run of
> 3 runs of main, both as low as 10.5 min if I don't continue working on the
> machine)...
>
> Do we want to push this refactor in now to avoid making a huge ball of
> changes that gets harder and harder to merge? The next push point would
> probably be when AdminServlet was functional (if/when that happens) (and we
> could not push that class for now).
>
> If you read this far, thanks :) I wasn't sure how feasible this would be
> so I felt the need to prove it to my self in code before wasting your time,
> but please don't hesitate to point to costs I might be missing or anything
> that looks like a mistake, or say why this was a total waste of time :)
>
> Thoughts?
>
> -Gus
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
> --
> - Mark
>
> http://about.me/markrmiller
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
> --
> - Mark
>
> http://about.me/markrmiller
>
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Clearly a refactor is needed here, thanks Gus for diving in deep.

Like Uwe I don't want us to be more dependent upon web.xml or Jetty's startup logic. That's why I filed SIP-6 (https://cwiki.apache.org/confluence/display/SOLR/SIP-6+Solr+should+own+the+bootstrap+process) to try to describe the vision that Uwe just described. If such a bootstrap redesign makes the dispatch cleanup easier or harder I don't know. Perhaps one should be done before the other? Or shuold be viewed as part of the same rewrite?

Jan

> 18. aug. 2021 kl. 12:00 skrev Uwe Schindler <uw...@thetaphi.de>:
> 
> Hi,
>  
> Thanks Mark, I fully agree with you on this: Booting up jetty with start.jar and using a web application is dramatically slower.
>  
> Nevertheless, I think that we should go away with the web application alltogether. We can still stay with the servlet API, but just – as in Gus patch – to split responsibilities and go away with a singleton ServletFilter, although from my own experience, I figured out that Servletapis’s URI handling is far away from ideal. In most cases at end I always landed in a separate “master” servlet (hooked to “/*”) that only parses the PathInfo with regular expressions (one thing that does not work with servlet api) and then delegates using RequestDispatcher to several more specific servlets or returns 404 not found. This setup would also help to split the huge DispatchFilter mess, but still the URI parsing logic would be centralized. But this servlet would only take care of the dispatching, so DispatchServlet would be fine. But it should only dispatch, nothing else.
>  
> My ideal Solr would be: Add some org.apache.solr.Bootstrap class to the main distribution and remove start.jar completely. Also remove jetty-webapp.jar and related dependencies. In Boostrap just start Jetty embedded, add listen ip, port numbers, etc. We can still use the config API of Jetty for this, so maybe put this into an XML file (although XML is a bad format for this stuff, goes back to web applications). Then Bootsrap starts the Corecontainer and the embedded Zookeeper (this is another horrible thing in Solr: the embedded zookeper starts somehow as part of the web application) and hooks the servlets into a (at first) single root ServletContextHandler. This is upside down. So Solr starts with a Bootstrap class, starts zookeeper, jetty and hands over to CoreContainer.
>  
> Some other ideas (not sure if they are already implemented in the patch): Maybe we should install a separate ServletContextHandler/Classloader for each Core. This would allow us to go away with our own classloader magic. Bootstrap could also do this: It starts the CoreContainer, Zookeeper, Jetty Installs a servlet for the admin UI and the Core/Collection management and then hands over to the CoreContainer to initialize all Cores. Each CoreContainer gets his own ServletContextHandler and can therefore easily be loaded unloaded from Jetty.
>  
> I wrote many other apps in my daily live like this, the startup time (especially for microservices) is great. A Jetty starting up in milliseconds and going to service is unbeatable.
>  
> Uwe
>  
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de <https://www.thetaphi.de/>
> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>  
> From: Mark Miller <ma...@gmail.com> 
> Sent: Wednesday, August 18, 2021 9:08 AM
> To: Gus Heck <gu...@gmail.com>
> Cc: dev@solr.apache.org
> Subject: Re: Solr and Jetty and Servlets
>  
> Agreement or not, I wouldn’t step in front of whatever you did.
>  
> I doubt anyone is going to take on actually pulling out of a web app because it’s easier to lean on it for configuration and I doubt start up speed is a very high priority for most.
>  
> It had never really occurred to me in the past that JettySolrRunner was so much faster. I just know it because of driving the test down to ridiculously times, starting up and stopping mini clusters of whatever size in literally no time. With no QuickStart setup for embedded Jetty.
>  
> So than starting up standalone Solr, it became apparent how much slower it was. So I started poking around. Cloud guys like those at google had already been feeling pain - preferring to be able to spin up and down relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s thing how they were dealing with it. And it’s simple, but it’s still easily non comparable at the end of the day. Solr ships with like 10,000 classes across a ton of jars, even with QuickStart, you have to follow the spec of what to do and what might be done by any given webapp - though as JettySolrRunner shows, we don’t actually need any of it. QuickStart is trying to find every trick to speed up following the container and webapp spec. Injecting the dispatch filter just tosses it all out :)
>  
> Any, may point was not that I disagree and you should make Solr inject a dispatch filter - my point is just that it doesn’t really matter to dispatch. You can do just as much via Jetty either way.
>  
> And anyone that doesn’t agree about “something should be done” is either lying or not looking. So I wouldn’t worry about agreement. The problem is that it’s quicksand that people are avoiding, not that anyone that looks can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted to work towards getting out of that hole. But at the end of the day, it’s now part of the hole. Made it deeper. Bermuda Triangle. But look at it - there was hope there. Hope that one day, dispatching would be better. Rather than twice as complicated. I don’t know that hope lives on. But no one gonna stop you on reasonable technical sense with any proper license.
>  
> Mark
>  
>  
>  
> On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>> *ServletContextHandler not ServletContext 
>>  
>> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>>> Ok nope, since that only happens with WebAppContext, not ServletContext (the parent class). whew :)
>>>  
>>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>>>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and thus lack of metadata-complete=true in web.xml means that we wind up scanning for annotations? 
>>>>  
>>>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>>>>> https://issues.apache.org/jira/browse/SOLR-15590 <https://issues.apache.org/jira/browse/SOLR-15590>
>>>>>  
>>>>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>>>>>> Ok well it would be interesting to compare quickstarting vs JettySolrRunner, and reading up on quickstart <https://webtide.com/jetty-9-quick-start/> gives ideas about pre-building the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either way we are still in a container and I think I hear some agreement that something should be done about the dispatch here, and both of you seem to agree that an actual servlet would make more sense than a filter, so I'll make a ticket/pr to make it easier to track & easier to read the code.
>>>>>>  
>>>>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <markrmiller@gmail.com <ma...@gmail.com>> wrote:
>>>>>>> The downside to respecting web.xml and making JettySolrServer serve a webapp is that loading a webapp is very expensive and slow in comparison.  JettySolrServer actually starts up extremely quickly. It’s almost more appealing to change the Server to use the JettySolrServer strategy. It’s so slow to load a webapp because of all the things it needs to support and scan jars for - in a kind of JavaEE situation, though not as bad. Jetty QuickStart does improve the situation if used at least. But there is really no need to eat the whole webapp standard for a non webapp app to have Jetty manage more of the dispatching. I always wondering about the motivation / upside of hacking in a straight servlet myself. But for a non webapp stuck in a servlet container, it’s actually a beautiful move that side steps a bunch of slow crap even better than the QuickStart pushes ever will, and still allows for matching any support we would want.
>>>>>>>  
>>>>>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I made a slim SolrCall base class that each extends. All that logic is hairy enough to follow without them interleaving and sharing in a kind of wild collaboration. It’s pretty unfortunate they both still exist.
>>>>>>>  
>>>>>>> You are right, leaving that path in that state is a poor idea. If there was anything that could be considered the “hot” path, it’s right there - feverishly checking and ripping up streams for anyone and anything. Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path down? Do a dance? Treat the string like a date and see if that works.
>>>>>>>  
>>>>>>> So yeah, agreed, nobody does or would dispatch this way, you have to frog boil into it. It’s slow, almost incomprehensible, and incredibly good at holding onto life, even building on it. But being a webapp and parsing web.xml has little to do with dispatching and having sensible http api that doesn’t work like it’s a perl compiler.
>>>>>>>  
>>>>>>> Anyway, I can’t imagine trying to trace through that code with any solid feeling about having a handle on it via inspection. But just like Perl, if you debug / trace log the hell out of it, it is all actually pretty simple to adjust and reign in. 
>>>>>>>  
>>>>>>>  
>>>>>>> MRM
>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>> So I'm not interested in *adhering* to anything here, just using stuff where it helps... I see tools sitting on the shelf, already bought and paid for and carried with us to every job but then they sit in the back of the truck mostly unused... and (I think) they look useful. 
>>>>>>>>  
>>>>>>>> This should not be seen as in any way advocating a return to war file deployment or "choose your container". All of what I would suggest should likely be compatible with a jetty startup controlled by our code (I assume we can convince it to read web.xml properly when doing so). Certainly point out anything that would interfere with that.
>>>>>>>>  
>>>>>>>> Here we sit in a container that is a tool box containing:
>>>>>>>> A nice mechanism for initializing stuff at the application level, (ServletContextListener) and a well defined guarantee that it runs before any filter or servlet, 
>>>>>>>> An automatic shutdown mechanism that it will call for us on graceful shutdown (ServletContextListner again). 
>>>>>>>> A nice layered filter chain mechanism which could have been used to layer in things like tracing and authentication, and close shields etc as small succinct filters rather than weaving them into an ever more complex filter & call class.
>>>>>>>> In more recent versions of the spec, for listeners defined in web.xml the order is also guaranteed.
>>>>>>>> Servlet classes that are *already* set up to distinguish http verbs automatically when desired
>>>>>>>> So why is this better? Because monster classes that do a hundred things are really hard to understand and maintain. Small methods, small classes whenever possible. I also suspect that there may be some gains in performance to be had if we rely more on the container (which will already be dispatching based on path) to choose our code paths (at least at a coarse level) and then have less custom dispatch logic executed on *every* request 
>>>>>>>>  
>>>>>>>> Obviously I'm wrong and if the net result is less performant to any significant degree forget it that wouldn't be worth it. (wanted: standardized solr benchmarks)
>>>>>>>>  
>>>>>>>> There WILL be complications with v2 because it is a subclass of HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it should be a separate servlet from v1, but we don't want to duplicate code either... so work to do there...  
>>>>>>>>  
>>>>>>>> I think an incremental approach is necessary since very few of us have the bandwidth to more than that and certainly it becomes difficult to find anyone with the time to review large changes. What I have thus far is stable with respect to the tests, and simplifies some stuff already which is why I chose this point to start the discussion
>>>>>>>>  
>>>>>>>> But yeah, look at what I did and say what you like and what you don't. :). 
>>>>>>>>  
>>>>>>>> -Gus
>>>>>>>>  
>>>>>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <dsmiley@apache.org <ma...@apache.org>> wrote:
>>>>>>>>> Sounds like an interesting adventure last weekend.
>>>>>>>>>  
>>>>>>>>> I'm unclear what the point of going this direction is; my instinct is to go the opposite direction.  You seem to suggest there are some simplification/organization benefits, which I love, so I'll need to look at what you've done to judge that for myself.  Yes Jetty supports the Servlet spec but we need not embrace it.  Adhering to that is useful if you have a generic web app that can be deployed to a container of the user's convenience/choosing.  No doubt this is why Solr started this way, and why the apps I built in my early days adhered to that spec.  But since 6.0, we view Solr as self-contained and more supportable if the project makes these decisions and thus not needlessly constrain itself as well.
>>>>>>>>>  
>>>>>>>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and not a *Servlet* itself.
>>>>>>>>>  
>>>>>>>>> Also, I suspect there may be complications in changes here relating to Solr's v1 vs v2 API.  And most definitely also what you discovered -- JettySolrRunner.
>>>>>>>>> 
>>>>>>>>> ~ David Smiley
>>>>>>>>> Apache Lucene/Solr Search Developer
>>>>>>>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>>>> TLDR: I've got a working branch <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer & our startup process is extracted from SolrDispatch Filter. Do other folks think that is interesting enough that I should make a JIRA and/or PR with intention to make this change?
>>>>>>>>>>  
>>>>>>>>>> Details:
>>>>>>>>>>  
>>>>>>>>>> Jetty is a servlet container, yet we more or less ignore it by stuffing everything into a single filter (almost, admin UI is served separately). I'm sure there are lots of historical reasons for this, probably including servlet containers and their specs were much less mature when solr was first started. Maybe also the early authors were more focused on making search work than leveraging what the container could do for them (but I wasn't there for that so that's just a guess).
>>>>>>>>>>  
>>>>>>>>>> The result is that we have a couple of very large classes that are touched by almost every request, and they have a lot of conditional logic trying to decide what the user is asking. Normally this sort of dispatch is done by the container based on the request URL to direct it to the appropriate servlet. Solr does a LOT of different things so this code is extremely tricky and complex to understand. Specifically, I'm speaking of SolrDispatchFilter and HttpSolrCall, which are so inseparable that being two classes probably makes them harder to understand.
>>>>>>>>>>  
>>>>>>>>>> It seems to me (and this mail is asking if you agree) that these classes are long overdue for some subdivision. The most obvious thing to pull out is all the admin calls. Admin code paths really have little or nothing to do with query or update code paths since there are no documents to route or sub-requests to some subset of nodes. 
>>>>>>>>>>  
>>>>>>>>>> The primary obstacle to any such separation and simplification is that most requests have some interaction with CoreContainer, and the things it holds, and this is initialized and held by a field in SolrDispatchFilter. After spending a significant chunk of time reading this code in the prior weeks and a timely and motivating conversation with Eric Pugh, I dumped a chunk of my weekend into an experiment to see if I could pull CoreContainer out of the dispatch filter, and leverage the facilities of our servlet container. 
>>>>>>>>>>  
>>>>>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was very confusing, and worrisome since I've realized it's not respecting our web.xml at all, and any configuration in web.xml needs to be duplicated for our tests in JettySolrRunner (tangent alert)
>>>>>>>>>>  
>>>>>>>>>> The result is that CoreContainer is now held by a class called CoreService (please help me name things if you don't like my names :) ). CoreService is a ServletContextListener, appropriately configured in web.xml, and has a static method that can be used to get a reference to the CoreContainer corresponding to the ServletContext in which code wanting a core container is running (this supports having multiple JettySolrRunners in tests, though probably never has more than one CoreContainer in the running application)
>>>>>>>>>>  
>>>>>>>>>> I achieved this in 4 stages shown here: https://github.com/gus-asf/solr/tree/servlet-solr <https://github.com/gus-asf/solr/tree/servlet-solr>
>>>>>>>>>>  
>>>>>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted without harm.
>>>>>>>>>>  
>>>>>>>>>> Since the current state of the code in that branch is apparently test-stable (4 runs of check in a row passing, none slower than any run of 3 runs of main, both as low as 10.5 min if I don't continue working on the machine)...
>>>>>>>>>>  
>>>>>>>>>> Do we want to push this refactor in now to avoid making a huge ball of changes that gets harder and harder to merge? The next push point would probably be when AdminServlet was functional (if/when that happens) (and we could not push that class for now).
>>>>>>>>>>  
>>>>>>>>>> If you read this far, thanks :) I wasn't sure how feasible this would be so I felt the need to prove it to my self in code before wasting your time, but please don't hesitate to point to costs I might be missing or anything that looks like a mistake, or say why this was a total waste of time :)
>>>>>>>>>>  
>>>>>>>>>> Thoughts?
>>>>>>>>>>  
>>>>>>>>>> -Gus
>>>>>>>>>> -- 
>>>>>>>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>>>>>>>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  
>>>>>>>> -- 
>>>>>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>>>>>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
>>>>>>> 
>>>>>>> -- 
>>>>>>> - Mark
>>>>>>>  
>>>>>>> http://about.me/markrmiller <http://about.me/markrmiller>
>>>>>> 
>>>>>>  
>>>>>> -- 
>>>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>>>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
>>>>> 
>>>>> 
>>>>>  
>>>>> -- 
>>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
>>>> 
>>>> 
>>>>  
>>>> -- 
>>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
>>> 
>>> 
>>>  
>>> -- 
>>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>>> http://www.the111shift.com <http://www.the111shift.com/> (play)
>> 
>> 
>>  
>> -- 
>> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
>> http://www.the111shift.com <http://www.the111shift.com/> (play)
> 
> -- 
> - Mark
>  
> http://about.me/markrmiller <http://about.me/markrmiller>

RE: Solr and Jetty and Servlets

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

 

Thanks Mark, I fully agree with you on this: Booting up jetty with start.jar and using a web application is dramatically slower.

 

Nevertheless, I think that we should go away with the web application alltogether. We can still stay with the servlet API, but just – as in Gus patch – to split responsibilities and go away with a singleton ServletFilter, although from my own experience, I figured out that Servletapis’s URI handling is far away from ideal. In most cases at end I always landed in a separate “master” servlet (hooked to “/*”) that only parses the PathInfo with regular expressions (one thing that does not work with servlet api) and then delegates using RequestDispatcher to several more specific servlets or returns 404 not found. This setup would also help to split the huge DispatchFilter mess, but still the URI parsing logic would be centralized. But this servlet would only take care of the dispatching, so DispatchServlet would be fine. But it should only dispatch, nothing else.

 

My ideal Solr would be: Add some org.apache.solr.Bootstrap class to the main distribution and remove start.jar completely. Also remove jetty-webapp.jar and related dependencies. In Boostrap just start Jetty embedded, add listen ip, port numbers, etc. We can still use the config API of Jetty for this, so maybe put this into an XML file (although XML is a bad format for this stuff, goes back to web applications). Then Bootsrap starts the Corecontainer and the embedded Zookeeper (this is another horrible thing in Solr: the embedded zookeper starts somehow as part of the web application) and hooks the servlets into a (at first) single root ServletContextHandler. This is upside down. So Solr starts with a Bootstrap class, starts zookeeper, jetty and hands over to CoreContainer.

 

Some other ideas (not sure if they are already implemented in the patch): Maybe we should install a separate ServletContextHandler/Classloader for each Core. This would allow us to go away with our own classloader magic. Bootstrap could also do this: It starts the CoreContainer, Zookeeper, Jetty Installs a servlet for the admin UI and the Core/Collection management and then hands over to the CoreContainer to initialize all Cores. Each CoreContainer gets his own ServletContextHandler and can therefore easily be loaded unloaded from Jetty.

 

I wrote many other apps in my daily live like this, the startup time (especially for microservices) is great. A Jetty starting up in milliseconds and going to service is unbeatable.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Mark Miller <ma...@gmail.com> 
Sent: Wednesday, August 18, 2021 9:08 AM
To: Gus Heck <gu...@gmail.com>
Cc: dev@solr.apache.org
Subject: Re: Solr and Jetty and Servlets

 

Agreement or not, I wouldn’t step in front of whatever you did.

 

I doubt anyone is going to take on actually pulling out of a web app because it’s easier to lean on it for configuration and I doubt start up speed is a very high priority for most.

 

It had never really occurred to me in the past that JettySolrRunner was so much faster. I just know it because of driving the test down to ridiculously times, starting up and stopping mini clusters of whatever size in literally no time. With no QuickStart setup for embedded Jetty.

 

So than starting up standalone Solr, it became apparent how much slower it was. So I started poking around. Cloud guys like those at google had already been feeling pain - preferring to be able to spin up and down relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s thing how they were dealing with it. And it’s simple, but it’s still easily non comparable at the end of the day. Solr ships with like 10,000 classes across a ton of jars, even with QuickStart, you have to follow the spec of what to do and what might be done by any given webapp - though as JettySolrRunner shows, we don’t actually need any of it. QuickStart is trying to find every trick to speed up following the container and webapp spec. Injecting the dispatch filter just tosses it all out :)

 

Any, may point was not that I disagree and you should make Solr inject a dispatch filter - my point is just that it doesn’t really matter to dispatch. You can do just as much via Jetty either way.

 

And anyone that doesn’t agree about “something should be done” is either lying or not looking. So I wouldn’t worry about agreement. The problem is that it’s quicksand that people are avoiding, not that anyone that looks can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted to work towards getting out of that hole. But at the end of the day, it’s now part of the hole. Made it deeper. Bermuda Triangle. But look at it - there was hope there. Hope that one day, dispatching would be better. Rather than twice as complicated. I don’t know that hope lives on. But no one gonna stop you on reasonable technical sense with any proper license.

 

Mark

 

 

 

On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

*ServletContextHandler not ServletContext 

 

On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

Ok nope, since that only happens with WebAppContext, not ServletContext (the parent class). whew :)

 

On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and thus lack of metadata-complete=true in web.xml means that we wind up scanning for annotations? 

 

On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

https://issues.apache.org/jira/browse/SOLR-15590

 

On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

Ok well it would be interesting to compare quickstarting vs JettySolrRunner, and reading up on quickstart <https://webtide.com/jetty-9-quick-start/>  gives ideas about pre-building the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either way we are still in a container and I think I hear some agreement that something should be done about the dispatch here, and both of you seem to agree that an actual servlet would make more sense than a filter, so I'll make a ticket/pr to make it easier to track & easier to read the code.

 

On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <markrmiller@gmail.com <ma...@gmail.com> > wrote:

The downside to respecting web.xml and making JettySolrServer serve a webapp is that loading a webapp is very expensive and slow in comparison.  JettySolrServer actually starts up extremely quickly. It’s almost more appealing to change the Server to use the JettySolrServer strategy. It’s so slow to load a webapp because of all the things it needs to support and scan jars for - in a kind of JavaEE situation, though not as bad. Jetty QuickStart does improve the situation if used at least. But there is really no need to eat the whole webapp standard for a non webapp app to have Jetty manage more of the dispatching. I always wondering about the motivation / upside of hacking in a straight servlet myself. But for a non webapp stuck in a servlet container, it’s actually a beautiful move that side steps a bunch of slow crap even better than the QuickStart pushes ever will, and still allows for matching any support we would want.

 

The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I made a slim SolrCall base class that each extends. All that logic is hairy enough to follow without them interleaving and sharing in a kind of wild collaboration. It’s pretty unfortunate they both still exist.

 

You are right, leaving that path in that state is a poor idea. If there was anything that could be considered the “hot” path, it’s right there - feverishly checking and ripping up streams for anyone and anything. Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path down? Do a dance? Treat the string like a date and see if that works.

 

So yeah, agreed, nobody does or would dispatch this way, you have to frog boil into it. It’s slow, almost incomprehensible, and incredibly good at holding onto life, even building on it. But being a webapp and parsing web.xml has little to do with dispatching and having sensible http api that doesn’t work like it’s a perl compiler.

 

Anyway, I can’t imagine trying to trace through that code with any solid feeling about having a handle on it via inspection. But just like Perl, if you debug / trace log the hell out of it, it is all actually pretty simple to adjust and reign in. 

 

 

MRM

 

 

 

On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

So I'm not interested in *adhering* to anything here, just using stuff where it helps... I see tools sitting on the shelf, already bought and paid for and carried with us to every job but then they sit in the back of the truck mostly unused... and (I think) they look useful. 

 

This should not be seen as in any way advocating a return to war file deployment or "choose your container". All of what I would suggest should likely be compatible with a jetty startup controlled by our code (I assume we can convince it to read web.xml properly when doing so). Certainly point out anything that would interfere with that.

 

Here we sit in a container that is a tool box containing:

*	A nice mechanism for initializing stuff at the application level, (ServletContextListener) and a well defined guarantee that it runs before any filter or servlet, 
*	An automatic shutdown mechanism that it will call for us on graceful shutdown (ServletContextListner again). 
*	A nice layered filter chain mechanism which could have been used to layer in things like tracing and authentication, and close shields etc as small succinct filters rather than weaving them into an ever more complex filter & call class.
*	In more recent versions of the spec, for listeners defined in web.xml the order is also guaranteed.
*	Servlet classes that are *already* set up to distinguish http verbs automatically when desired

So why is this better? Because monster classes that do a hundred things are really hard to understand and maintain. Small methods, small classes whenever possible. I also suspect that there may be some gains in performance to be had if we rely more on the container (which will already be dispatching based on path) to choose our code paths (at least at a coarse level) and then have less custom dispatch logic executed on *every* request 

 

Obviously I'm wrong and if the net result is less performant to any significant degree forget it that wouldn't be worth it. (wanted: standardized solr benchmarks)

 

There WILL be complications with v2 because it is a subclass of HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it should be a separate servlet from v1, but we don't want to duplicate code either... so work to do there...  

 

I think an incremental approach is necessary since very few of us have the bandwidth to more than that and certainly it becomes difficult to find anyone with the time to review large changes. What I have thus far is stable with respect to the tests, and simplifies some stuff already which is why I chose this point to start the discussion

 

But yeah, look at what I did and say what you like and what you don't. :). 

 

-Gus

 

On Mon, Aug 16, 2021 at 9:42 PM David Smiley <dsmiley@apache.org <ma...@apache.org> > wrote:

Sounds like an interesting adventure last weekend.

 

I'm unclear what the point of going this direction is; my instinct is to go the opposite direction.  You seem to suggest there are some simplification/organization benefits, which I love, so I'll need to look at what you've done to judge that for myself.  Yes Jetty supports the Servlet spec but we need not embrace it.  Adhering to that is useful if you have a generic web app that can be deployed to a container of the user's convenience/choosing.  No doubt this is why Solr started this way, and why the apps I built in my early days adhered to that spec.  But since 6.0, we view Solr as self-contained and more supportable if the project makes these decisions and thus not needlessly constrain itself as well.

 

It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and not a *Servlet* itself.

 

Also, I suspect there may be complications in changes here relating to Solr's v1 vs v2 API.  And most definitely also what you discovered -- JettySolrRunner.




~ David Smiley

Apache Lucene/Solr Search Developer

http://www.linkedin.com/in/davidwsmiley

 

 

On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gus.heck@gmail.com <ma...@gmail.com> > wrote:

TLDR: I've got a working branch <https://github.com/gus-asf/solr/tree/servlet-solr>  where CoreContainer & our startup process is extracted from SolrDispatch Filter. Do other folks think that is interesting enough that I should make a JIRA and/or PR with intention to make this change?

 

Details:

 

Jetty is a servlet container, yet we more or less ignore it by stuffing everything into a single filter (almost, admin UI is served separately). I'm sure there are lots of historical reasons for this, probably including servlet containers and their specs were much less mature when solr was first started. Maybe also the early authors were more focused on making search work than leveraging what the container could do for them (but I wasn't there for that so that's just a guess).

 

The result is that we have a couple of very large classes that are touched by almost every request, and they have a lot of conditional logic trying to decide what the user is asking. Normally this sort of dispatch is done by the container based on the request URL to direct it to the appropriate servlet. Solr does a LOT of different things so this code is extremely tricky and complex to understand. Specifically, I'm speaking of SolrDispatchFilter and HttpSolrCall, which are so inseparable that being two classes probably makes them harder to understand.

 

It seems to me (and this mail is asking if you agree) that these classes are long overdue for some subdivision. The most obvious thing to pull out is all the admin calls. Admin code paths really have little or nothing to do with query or update code paths since there are no documents to route or sub-requests to some subset of nodes. 

 

The primary obstacle to any such separation and simplification is that most requests have some interaction with CoreContainer, and the things it holds, and this is initialized and held by a field in SolrDispatchFilter. After spending a significant chunk of time reading this code in the prior weeks and a timely and motivating conversation with Eric Pugh, I dumped a chunk of my weekend into an experiment to see if I could pull CoreContainer out of the dispatch filter, and leverage the facilities of our servlet container. 

 

That wasn't too terribly hard, but keeping JettySolrRunner happy was very confusing, and worrisome since I've realized it's not respecting our web.xml at all, and any configuration in web.xml needs to be duplicated for our tests in JettySolrRunner (tangent alert)

 

The result is that CoreContainer is now held by a class called CoreService (please help me name things if you don't like my names :) ). CoreService is a ServletContextListener, appropriately configured in web.xml, and has a static method that can be used to get a reference to the CoreContainer corresponding to the ServletContext in which code wanting a core container is running (this supports having multiple JettySolrRunners in tests, though probably never has more than one CoreContainer in the running application)

 

I achieved this in 4 stages shown here: https://github.com/gus-asf/solr/tree/servlet-solr

 

Ignore the AdminServlet class, it's a placeholder, and can be subtracted without harm.

 

Since the current state of the code in that branch is apparently test-stable (4 runs of check in a row passing, none slower than any run of 3 runs of main, both as low as 10.5 min if I don't continue working on the machine)...

 

Do we want to push this refactor in now to avoid making a huge ball of changes that gets harder and harder to merge? The next push point would probably be when AdminServlet was functional (if/when that happens) (and we could not push that class for now).

 

If you read this far, thanks :) I wasn't sure how feasible this would be so I felt the need to prove it to my self in code before wasting your time, but please don't hesitate to point to costs I might be missing or anything that looks like a mistake, or say why this was a total waste of time :)

 

Thoughts?

 

-Gus

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)




 

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)

-- 

- Mark

 

http://about.me/markrmiller




 

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)




 

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)




 

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)




 

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)




 

-- 

http://www.needhamsoftware.com (work)

http://www.the111shift.com (play)

-- 

- Mark

 

http://about.me/markrmiller


Re: Solr and Jetty and Servlets

Posted by Mark Miller <ma...@gmail.com>.
Agreement or not, I wouldn’t step in front of whatever you did.

I doubt anyone is going to take on actually pulling out of a web app
because it’s easier to lean on it for configuration and I doubt start up
speed is a very high priority for most.

It had never really occurred to me in the past that JettySolrRunner was so
much faster. I just know it because of driving the test down to
ridiculously times, starting up and stopping mini clusters of whatever size
in literally no time. With no QuickStart setup for embedded Jetty.

So than starting up standalone Solr, it became apparent how much slower it
was. So I started poking around. Cloud guys like those at google had
already been feeling pain - preferring to be able to spin up and down
relatively quickly - and so this QuickStart “precompile” web.xml / JSP’s
thing how they were dealing with it. And it’s simple, but it’s still easily
non comparable at the end of the day. Solr ships with like 10,000 classes
across a ton of jars, even with QuickStart, you have to follow the spec of
what to do and what might be done by any given webapp - though as
JettySolrRunner shows, we don’t actually need any of it. QuickStart is
trying to find every trick to speed up following the container and webapp
spec. Injecting the dispatch filter just tosses it all out :)

Any, may point was not that I disagree and you should make Solr inject a
dispatch filter - my point is just that it doesn’t really matter to
dispatch. You can do just as much via Jetty either way.

And anyone that doesn’t agree about “something should be done” is either
lying or not looking. So I wouldn’t worry about agreement. The problem is
that it’s quicksand that people are avoiding, not that anyone that looks
can’t see it a huge rats nest. Look at the V2 API. Someone actually wanted
to work towards getting out of that hole. But at the end of the day, it’s
now part of the hole. Made it deeper. Bermuda Triangle. But look at it -
there was hope there. Hope that one day, dispatching would be better.
Rather than twice as complicated. I don’t know that hope lives on. But no
one gonna stop you on reasonable technical sense with any proper license.

Mark



On Tue, Aug 17, 2021 at 10:12 AM Gus Heck <gu...@gmail.com> wrote:

> *ServletContextHandler not ServletContext
>
> On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:
>
>> Ok nope, since that only happens with WebAppContext, not ServletContext
>> (the parent class). whew :)
>>
>> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
>>
>>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
>>> thus lack of metadata-complete=true in web.xml means that we wind up
>>> scanning for annotations?
>>>
>>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>>> https://issues.apache.org/jira/browse/SOLR-15590
>>>>
>>>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>>>>
>>>>> Ok well it would be interesting to compare quickstarting vs
>>>>> JettySolrRunner, and reading up on quickstart
>>>>> <https://webtide.com/jetty-9-quick-start/> gives ideas about
>>>>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
>>>>> tangent. Either way we are still in a container and I think I hear some
>>>>> agreement that something should be done about the dispatch here, and both
>>>>> of you seem to agree that an actual servlet would make more sense than a
>>>>> filter, so I'll make a ticket/pr to make it easier to track & easier to
>>>>> read the code.
>>>>>
>>>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The downside to respecting web.xml and making JettySolrServer serve a
>>>>>> webapp is that loading a webapp is very expensive and slow in comparison.
>>>>>> JettySolrServer actually starts up extremely quickly. It’s almost more
>>>>>> appealing to change the Server to use the JettySolrServer strategy. It’s so
>>>>>> slow to load a webapp because of all the things it needs to support and
>>>>>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>>>>>> QuickStart does improve the situation if used at least. But there is really
>>>>>> no need to eat the whole webapp standard for a non webapp app to have Jetty
>>>>>> manage more of the dispatching. I always wondering about the motivation /
>>>>>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>>>>>> in a servlet container, it’s actually a beautiful move that side steps a
>>>>>> bunch of slow crap even better than the QuickStart pushes ever will, and
>>>>>> still allows for matching any support we would want.
>>>>>>
>>>>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally,
>>>>>> I made a slim SolrCall base class that each extends. All that logic is
>>>>>> hairy enough to follow without them interleaving and sharing in a kind of
>>>>>> wild collaboration. It’s pretty unfortunate they both still exist.
>>>>>>
>>>>>> You are right, leaving that path in that state is a poor idea. If
>>>>>> there was anything that could be considered the “hot” path, it’s right
>>>>>> there - feverishly checking and ripping up streams for anyone and anything.
>>>>>> Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the
>>>>>> path down? Do a dance? Treat the string like a date and see if that works.
>>>>>>
>>>>>> So yeah, agreed, nobody does or would dispatch this way, you have to
>>>>>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good
>>>>>> at holding onto life, even building on it. But being a webapp and parsing
>>>>>> web.xml has little to do with dispatching and having sensible http api that
>>>>>> doesn’t work like it’s a perl compiler.
>>>>>>
>>>>>> Anyway, I can’t imagine trying to trace through that code with any
>>>>>> solid feeling about having a handle on it via inspection. But just like
>>>>>> Perl, if you debug / trace log the hell out of it, it is all actually
>>>>>> pretty simple to adjust and reign in.
>>>>>>
>>>>>>
>>>>>> MRM
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>>>>>
>>>>>>> So I'm not interested in *adhering* to anything here, just using
>>>>>>> stuff where it helps... I see tools sitting on the shelf, already bought
>>>>>>> and paid for and carried with us to every job but then they sit in the back
>>>>>>> of the truck mostly unused... and (I think) they look useful.
>>>>>>>
>>>>>>> This should not be seen as in any way advocating a return to war
>>>>>>> file deployment or "choose your container". All of what I would suggest
>>>>>>> should likely be compatible with a jetty startup controlled by our code (I
>>>>>>> assume we can convince it to read web.xml properly when doing so).
>>>>>>> Certainly point out anything that would interfere with that.
>>>>>>>
>>>>>>> Here we sit in a container that is a tool box containing:
>>>>>>>
>>>>>>>    - A nice mechanism for initializing stuff at the application
>>>>>>>    level, (ServletContextListener) and a well defined guarantee that it runs
>>>>>>>    before any filter or servlet,
>>>>>>>    - An automatic shutdown mechanism that it will call for us on
>>>>>>>    graceful shutdown (ServletContextListner again).
>>>>>>>    - A nice layered filter chain mechanism which could have been
>>>>>>>    used to layer in things like tracing and authentication, and close shields
>>>>>>>    etc as small succinct filters rather than weaving them into an ever more
>>>>>>>    complex filter & call class.
>>>>>>>    - In more recent versions of the spec, for listeners defined in
>>>>>>>    web.xml the order is also guaranteed.
>>>>>>>    - Servlet classes that are *already* set up to distinguish http
>>>>>>>    verbs automatically when desired
>>>>>>>
>>>>>>> So why is this better? Because monster classes that do a hundred
>>>>>>> things are really hard to understand and maintain. Small methods, small
>>>>>>> classes whenever possible. I also suspect that there may be some gains in
>>>>>>> performance to be had if we rely more on the container (which will already
>>>>>>> be dispatching based on path) to choose our code paths (at least at a
>>>>>>> coarse level) and then have less custom dispatch logic executed on *every*
>>>>>>> request
>>>>>>>
>>>>>>> Obviously I'm wrong and if the net result is less performant to any
>>>>>>> significant degree forget it that wouldn't be worth it. (wanted:
>>>>>>> standardized solr benchmarks)
>>>>>>>
>>>>>>> There WILL be complications with v2 because it is a subclass of
>>>>>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>>>>>>> should be a separate servlet from v1, but we don't want to duplicate code
>>>>>>> either... so work to do there...
>>>>>>>
>>>>>>> I think an incremental approach is necessary since very few of us
>>>>>>> have the bandwidth to more than that and certainly it becomes difficult to
>>>>>>> find anyone with the time to review large changes. What I have thus far is
>>>>>>> stable with respect to the tests, and simplifies some stuff already which
>>>>>>> is why I chose this point to start the discussion
>>>>>>>
>>>>>>> But yeah, look at what I did and say what you like and what you
>>>>>>> don't. :).
>>>>>>>
>>>>>>> -Gus
>>>>>>>
>>>>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sounds like an interesting adventure last weekend.
>>>>>>>>
>>>>>>>> I'm unclear what the point of going this direction is; my instinct
>>>>>>>> is to go the opposite direction.  You seem to suggest there are some
>>>>>>>> simplification/organization benefits, which I love, so I'll need to look at
>>>>>>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>>>>>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>>>>>>> generic web app that can be deployed to a container of the user's
>>>>>>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>>>>>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>>>>>>> view Solr as self-contained and more supportable if the project makes these
>>>>>>>> decisions and thus not needlessly constrain itself as well.
>>>>>>>>
>>>>>>>> It is super weird to me that SolrDispatchFilter is a Servlet
>>>>>>>> *Filter* and not a *Servlet* itself.
>>>>>>>>
>>>>>>>> Also, I suspect there may be complications in changes here relating
>>>>>>>> to Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>>>>>>> JettySolrRunner.
>>>>>>>>
>>>>>>>> ~ David Smiley
>>>>>>>> Apache Lucene/Solr Search Developer
>>>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> *TLDR:* I've got a working branch
>>>>>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>>>>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>>>>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>>>>>>> and/or PR with intention to make this change?
>>>>>>>>>
>>>>>>>>> *Details:*
>>>>>>>>>
>>>>>>>>> Jetty is a servlet container, yet we more or less ignore it by
>>>>>>>>> stuffing everything into a single filter (almost, admin UI is served
>>>>>>>>> separately). I'm sure there are lots of historical reasons for this,
>>>>>>>>> probably including servlet containers and their specs were much less mature
>>>>>>>>> when solr was first started. Maybe also the early authors were more focused
>>>>>>>>> on making search work than leveraging what the container could do for them
>>>>>>>>> (but I wasn't there for that so that's just a guess).
>>>>>>>>>
>>>>>>>>> The result is that we have a couple of very large classes that are
>>>>>>>>> touched by almost every request, and they have a lot of conditional logic
>>>>>>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>>>>>>> done by the container based on the request URL to direct it to the
>>>>>>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>>>>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>>>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>>>>>>> two classes probably makes them harder to understand.
>>>>>>>>>
>>>>>>>>> It seems to me (and this mail is asking if you agree) that these
>>>>>>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>>>>>>> pull out is all the admin calls. Admin code paths really have little or
>>>>>>>>> nothing to do with query or update code paths since there are no documents
>>>>>>>>> to route or sub-requests to some subset of nodes.
>>>>>>>>>
>>>>>>>>> The primary obstacle to any such separation and simplification is
>>>>>>>>> that most requests have some interaction with CoreContainer, and the things
>>>>>>>>> it holds, and this is initialized and held by a field in
>>>>>>>>> SolrDispatchFilter. After spending a significant chunk of time reading this
>>>>>>>>> code in the prior weeks and a timely and motivating conversation with Eric
>>>>>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I could
>>>>>>>>> pull CoreContainer out of the dispatch filter, and leverage the facilities
>>>>>>>>> of our servlet container.
>>>>>>>>>
>>>>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy
>>>>>>>>> was very confusing, and worrisome since I've realized it's not respecting
>>>>>>>>> our web.xml at all, and any configuration in web.xml needs to be duplicated
>>>>>>>>> for our tests in JettySolrRunner (tangent alert)
>>>>>>>>>
>>>>>>>>> The result is that CoreContainer is now held by a class called
>>>>>>>>> CoreService (please help me name things if you don't like my names :) ).
>>>>>>>>> CoreService is a ServletContextListener, appropriately configured in
>>>>>>>>> web.xml, and has a static method that can be used to get a reference to the
>>>>>>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>>>>>>> core container is running (this supports having multiple JettySolrRunners
>>>>>>>>> in tests, though probably never has more than one CoreContainer in the
>>>>>>>>> running application)
>>>>>>>>>
>>>>>>>>> I achieved this in 4 stages shown here:
>>>>>>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>>>>>>
>>>>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>>>>>>> subtracted without harm.
>>>>>>>>>
>>>>>>>>> Since the current state of the code in that branch is apparently
>>>>>>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>>>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>>>>>>> machine)...
>>>>>>>>>
>>>>>>>>> Do we want to push this refactor in now to avoid making a huge
>>>>>>>>> ball of changes that gets harder and harder to merge? The next push point
>>>>>>>>> would probably be when AdminServlet was functional (if/when that happens)
>>>>>>>>> (and we could not push that class for now).
>>>>>>>>>
>>>>>>>>> If you read this far, thanks :) I wasn't sure how feasible this
>>>>>>>>> would be so I felt the need to prove it to my self in code before wasting
>>>>>>>>> your time, but please don't hesitate to point to costs I might be missing
>>>>>>>>> or anything that looks like a mistake, or say why this was a total waste of
>>>>>>>>> time :)
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> -Gus
>>>>>>>>> --
>>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>> http://www.the111shift.com (play)
>>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>>
>>>>>> http://about.me/markrmiller
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
-- 
- Mark

http://about.me/markrmiller

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
*ServletContextHandler not ServletContext

On Tue, Aug 17, 2021 at 11:03 AM Gus Heck <gu...@gmail.com> wrote:

> Ok nope, since that only happens with WebAppContext, not ServletContext
> (the parent class). whew :)
>
> On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:
>
>> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
>> thus lack of metadata-complete=true in web.xml means that we wind up
>> scanning for annotations?
>>
>> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>>
>>> https://issues.apache.org/jira/browse/SOLR-15590
>>>
>>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>>> Ok well it would be interesting to compare quickstarting vs
>>>> JettySolrRunner, and reading up on quickstart
>>>> <https://webtide.com/jetty-9-quick-start/> gives ideas about
>>>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
>>>> tangent. Either way we are still in a container and I think I hear some
>>>> agreement that something should be done about the dispatch here, and both
>>>> of you seem to agree that an actual servlet would make more sense than a
>>>> filter, so I'll make a ticket/pr to make it easier to track & easier to
>>>> read the code.
>>>>
>>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> The downside to respecting web.xml and making JettySolrServer serve a
>>>>> webapp is that loading a webapp is very expensive and slow in comparison.
>>>>> JettySolrServer actually starts up extremely quickly. It’s almost more
>>>>> appealing to change the Server to use the JettySolrServer strategy. It’s so
>>>>> slow to load a webapp because of all the things it needs to support and
>>>>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>>>>> QuickStart does improve the situation if used at least. But there is really
>>>>> no need to eat the whole webapp standard for a non webapp app to have Jetty
>>>>> manage more of the dispatching. I always wondering about the motivation /
>>>>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>>>>> in a servlet container, it’s actually a beautiful move that side steps a
>>>>> bunch of slow crap even better than the QuickStart pushes ever will, and
>>>>> still allows for matching any support we would want.
>>>>>
>>>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
>>>>> made a slim SolrCall base class that each extends. All that logic is hairy
>>>>> enough to follow without them interleaving and sharing in a kind of wild
>>>>> collaboration. It’s pretty unfortunate they both still exist.
>>>>>
>>>>> You are right, leaving that path in that state is a poor idea. If
>>>>> there was anything that could be considered the “hot” path, it’s right
>>>>> there - feverishly checking and ripping up streams for anyone and anything.
>>>>> Is a core? A collection? A handler? Maybe a V2 handler? What I’d we cut the
>>>>> path down? Do a dance? Treat the string like a date and see if that works.
>>>>>
>>>>> So yeah, agreed, nobody does or would dispatch this way, you have to
>>>>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good
>>>>> at holding onto life, even building on it. But being a webapp and parsing
>>>>> web.xml has little to do with dispatching and having sensible http api that
>>>>> doesn’t work like it’s a perl compiler.
>>>>>
>>>>> Anyway, I can’t imagine trying to trace through that code with any
>>>>> solid feeling about having a handle on it via inspection. But just like
>>>>> Perl, if you debug / trace log the hell out of it, it is all actually
>>>>> pretty simple to adjust and reign in.
>>>>>
>>>>>
>>>>> MRM
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>>>>
>>>>>> So I'm not interested in *adhering* to anything here, just using
>>>>>> stuff where it helps... I see tools sitting on the shelf, already bought
>>>>>> and paid for and carried with us to every job but then they sit in the back
>>>>>> of the truck mostly unused... and (I think) they look useful.
>>>>>>
>>>>>> This should not be seen as in any way advocating a return to war file
>>>>>> deployment or "choose your container". All of what I would suggest should
>>>>>> likely be compatible with a jetty startup controlled by our code (I assume
>>>>>> we can convince it to read web.xml properly when doing so). Certainly point
>>>>>> out anything that would interfere with that.
>>>>>>
>>>>>> Here we sit in a container that is a tool box containing:
>>>>>>
>>>>>>    - A nice mechanism for initializing stuff at the application
>>>>>>    level, (ServletContextListener) and a well defined guarantee that it runs
>>>>>>    before any filter or servlet,
>>>>>>    - An automatic shutdown mechanism that it will call for us on
>>>>>>    graceful shutdown (ServletContextListner again).
>>>>>>    - A nice layered filter chain mechanism which could have been
>>>>>>    used to layer in things like tracing and authentication, and close shields
>>>>>>    etc as small succinct filters rather than weaving them into an ever more
>>>>>>    complex filter & call class.
>>>>>>    - In more recent versions of the spec, for listeners defined in
>>>>>>    web.xml the order is also guaranteed.
>>>>>>    - Servlet classes that are *already* set up to distinguish http
>>>>>>    verbs automatically when desired
>>>>>>
>>>>>> So why is this better? Because monster classes that do a hundred
>>>>>> things are really hard to understand and maintain. Small methods, small
>>>>>> classes whenever possible. I also suspect that there may be some gains in
>>>>>> performance to be had if we rely more on the container (which will already
>>>>>> be dispatching based on path) to choose our code paths (at least at a
>>>>>> coarse level) and then have less custom dispatch logic executed on *every*
>>>>>> request
>>>>>>
>>>>>> Obviously I'm wrong and if the net result is less performant to any
>>>>>> significant degree forget it that wouldn't be worth it. (wanted:
>>>>>> standardized solr benchmarks)
>>>>>>
>>>>>> There WILL be complications with v2 because it is a subclass of
>>>>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>>>>>> should be a separate servlet from v1, but we don't want to duplicate code
>>>>>> either... so work to do there...
>>>>>>
>>>>>> I think an incremental approach is necessary since very few of us
>>>>>> have the bandwidth to more than that and certainly it becomes difficult to
>>>>>> find anyone with the time to review large changes. What I have thus far is
>>>>>> stable with respect to the tests, and simplifies some stuff already which
>>>>>> is why I chose this point to start the discussion
>>>>>>
>>>>>> But yeah, look at what I did and say what you like and what you
>>>>>> don't. :).
>>>>>>
>>>>>> -Gus
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Sounds like an interesting adventure last weekend.
>>>>>>>
>>>>>>> I'm unclear what the point of going this direction is; my instinct
>>>>>>> is to go the opposite direction.  You seem to suggest there are some
>>>>>>> simplification/organization benefits, which I love, so I'll need to look at
>>>>>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>>>>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>>>>>> generic web app that can be deployed to a container of the user's
>>>>>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>>>>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>>>>>> view Solr as self-contained and more supportable if the project makes these
>>>>>>> decisions and thus not needlessly constrain itself as well.
>>>>>>>
>>>>>>> It is super weird to me that SolrDispatchFilter is a Servlet
>>>>>>> *Filter* and not a *Servlet* itself.
>>>>>>>
>>>>>>> Also, I suspect there may be complications in changes here relating
>>>>>>> to Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>>>>>> JettySolrRunner.
>>>>>>>
>>>>>>> ~ David Smiley
>>>>>>> Apache Lucene/Solr Search Developer
>>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> *TLDR:* I've got a working branch
>>>>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>>>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>>>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>>>>>> and/or PR with intention to make this change?
>>>>>>>>
>>>>>>>> *Details:*
>>>>>>>>
>>>>>>>> Jetty is a servlet container, yet we more or less ignore it by
>>>>>>>> stuffing everything into a single filter (almost, admin UI is served
>>>>>>>> separately). I'm sure there are lots of historical reasons for this,
>>>>>>>> probably including servlet containers and their specs were much less mature
>>>>>>>> when solr was first started. Maybe also the early authors were more focused
>>>>>>>> on making search work than leveraging what the container could do for them
>>>>>>>> (but I wasn't there for that so that's just a guess).
>>>>>>>>
>>>>>>>> The result is that we have a couple of very large classes that are
>>>>>>>> touched by almost every request, and they have a lot of conditional logic
>>>>>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>>>>>> done by the container based on the request URL to direct it to the
>>>>>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>>>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>>>>>> two classes probably makes them harder to understand.
>>>>>>>>
>>>>>>>> It seems to me (and this mail is asking if you agree) that these
>>>>>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>>>>>> pull out is all the admin calls. Admin code paths really have little or
>>>>>>>> nothing to do with query or update code paths since there are no documents
>>>>>>>> to route or sub-requests to some subset of nodes.
>>>>>>>>
>>>>>>>> The primary obstacle to any such separation and simplification is
>>>>>>>> that most requests have some interaction with CoreContainer, and the things
>>>>>>>> it holds, and this is initialized and held by a field in
>>>>>>>> SolrDispatchFilter. After spending a significant chunk of time reading this
>>>>>>>> code in the prior weeks and a timely and motivating conversation with Eric
>>>>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I could
>>>>>>>> pull CoreContainer out of the dispatch filter, and leverage the facilities
>>>>>>>> of our servlet container.
>>>>>>>>
>>>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy
>>>>>>>> was very confusing, and worrisome since I've realized it's not respecting
>>>>>>>> our web.xml at all, and any configuration in web.xml needs to be duplicated
>>>>>>>> for our tests in JettySolrRunner (tangent alert)
>>>>>>>>
>>>>>>>> The result is that CoreContainer is now held by a class called
>>>>>>>> CoreService (please help me name things if you don't like my names :) ).
>>>>>>>> CoreService is a ServletContextListener, appropriately configured in
>>>>>>>> web.xml, and has a static method that can be used to get a reference to the
>>>>>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>>>>>> core container is running (this supports having multiple JettySolrRunners
>>>>>>>> in tests, though probably never has more than one CoreContainer in the
>>>>>>>> running application)
>>>>>>>>
>>>>>>>> I achieved this in 4 stages shown here:
>>>>>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>>>>>
>>>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>>>>>> subtracted without harm.
>>>>>>>>
>>>>>>>> Since the current state of the code in that branch is apparently
>>>>>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>>>>>> machine)...
>>>>>>>>
>>>>>>>> Do we want to push this refactor in now to avoid making a huge ball
>>>>>>>> of changes that gets harder and harder to merge? The next push point would
>>>>>>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>>>>>>> could not push that class for now).
>>>>>>>>
>>>>>>>> If you read this far, thanks :) I wasn't sure how feasible this
>>>>>>>> would be so I felt the need to prove it to my self in code before wasting
>>>>>>>> your time, but please don't hesitate to point to costs I might be missing
>>>>>>>> or anything that looks like a mistake, or say why this was a total waste of
>>>>>>>> time :)
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> -Gus
>>>>>>>> --
>>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>>> http://www.the111shift.com (play)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> http://www.needhamsoftware.com (work)
>>>>>> http://www.the111shift.com (play)
>>>>>>
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://about.me/markrmiller
>>>>>
>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
Ok nope, since that only happens with WebAppContext, not ServletContext
(the parent class). whew :)

On Tue, Aug 17, 2021 at 10:33 AM Gus Heck <gu...@gmail.com> wrote:

> WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and
> thus lack of metadata-complete=true in web.xml means that we wind up
> scanning for annotations?
>
> On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:
>
>> https://issues.apache.org/jira/browse/SOLR-15590
>>
>> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>>
>>> Ok well it would be interesting to compare quickstarting vs
>>> JettySolrRunner, and reading up on quickstart
>>> <https://webtide.com/jetty-9-quick-start/> gives ideas about
>>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
>>> tangent. Either way we are still in a container and I think I hear some
>>> agreement that something should be done about the dispatch here, and both
>>> of you seem to agree that an actual servlet would make more sense than a
>>> filter, so I'll make a ticket/pr to make it easier to track & easier to
>>> read the code.
>>>
>>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
>>> wrote:
>>>
>>>> The downside to respecting web.xml and making JettySolrServer serve a
>>>> webapp is that loading a webapp is very expensive and slow in comparison.
>>>> JettySolrServer actually starts up extremely quickly. It’s almost more
>>>> appealing to change the Server to use the JettySolrServer strategy. It’s so
>>>> slow to load a webapp because of all the things it needs to support and
>>>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>>>> QuickStart does improve the situation if used at least. But there is really
>>>> no need to eat the whole webapp standard for a non webapp app to have Jetty
>>>> manage more of the dispatching. I always wondering about the motivation /
>>>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>>>> in a servlet container, it’s actually a beautiful move that side steps a
>>>> bunch of slow crap even better than the QuickStart pushes ever will, and
>>>> still allows for matching any support we would want.
>>>>
>>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
>>>> made a slim SolrCall base class that each extends. All that logic is hairy
>>>> enough to follow without them interleaving and sharing in a kind of wild
>>>> collaboration. It’s pretty unfortunate they both still exist.
>>>>
>>>> You are right, leaving that path in that state is a poor idea. If there
>>>> was anything that could be considered the “hot” path, it’s right there -
>>>> feverishly checking and ripping up streams for anyone and anything. Is a
>>>> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
>>>> down? Do a dance? Treat the string like a date and see if that works.
>>>>
>>>> So yeah, agreed, nobody does or would dispatch this way, you have to
>>>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good
>>>> at holding onto life, even building on it. But being a webapp and parsing
>>>> web.xml has little to do with dispatching and having sensible http api that
>>>> doesn’t work like it’s a perl compiler.
>>>>
>>>> Anyway, I can’t imagine trying to trace through that code with any
>>>> solid feeling about having a handle on it via inspection. But just like
>>>> Perl, if you debug / trace log the hell out of it, it is all actually
>>>> pretty simple to adjust and reign in.
>>>>
>>>>
>>>> MRM
>>>>
>>>>
>>>>
>>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>>>
>>>>> So I'm not interested in *adhering* to anything here, just using stuff
>>>>> where it helps... I see tools sitting on the shelf, already bought and paid
>>>>> for and carried with us to every job but then they sit in the back of the
>>>>> truck mostly unused... and (I think) they look useful.
>>>>>
>>>>> This should not be seen as in any way advocating a return to war file
>>>>> deployment or "choose your container". All of what I would suggest should
>>>>> likely be compatible with a jetty startup controlled by our code (I assume
>>>>> we can convince it to read web.xml properly when doing so). Certainly point
>>>>> out anything that would interfere with that.
>>>>>
>>>>> Here we sit in a container that is a tool box containing:
>>>>>
>>>>>    - A nice mechanism for initializing stuff at the application
>>>>>    level, (ServletContextListener) and a well defined guarantee that it runs
>>>>>    before any filter or servlet,
>>>>>    - An automatic shutdown mechanism that it will call for us on
>>>>>    graceful shutdown (ServletContextListner again).
>>>>>    - A nice layered filter chain mechanism which could have been used
>>>>>    to layer in things like tracing and authentication, and close shields etc
>>>>>    as small succinct filters rather than weaving them into an ever more
>>>>>    complex filter & call class.
>>>>>    - In more recent versions of the spec, for listeners defined in
>>>>>    web.xml the order is also guaranteed.
>>>>>    - Servlet classes that are *already* set up to distinguish http
>>>>>    verbs automatically when desired
>>>>>
>>>>> So why is this better? Because monster classes that do a hundred
>>>>> things are really hard to understand and maintain. Small methods, small
>>>>> classes whenever possible. I also suspect that there may be some gains in
>>>>> performance to be had if we rely more on the container (which will already
>>>>> be dispatching based on path) to choose our code paths (at least at a
>>>>> coarse level) and then have less custom dispatch logic executed on *every*
>>>>> request
>>>>>
>>>>> Obviously I'm wrong and if the net result is less performant to any
>>>>> significant degree forget it that wouldn't be worth it. (wanted:
>>>>> standardized solr benchmarks)
>>>>>
>>>>> There WILL be complications with v2 because it is a subclass of
>>>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>>>>> should be a separate servlet from v1, but we don't want to duplicate code
>>>>> either... so work to do there...
>>>>>
>>>>> I think an incremental approach is necessary since very few of us have
>>>>> the bandwidth to more than that and certainly it becomes difficult to find
>>>>> anyone with the time to review large changes. What I have thus far is
>>>>> stable with respect to the tests, and simplifies some stuff already which
>>>>> is why I chose this point to start the discussion
>>>>>
>>>>> But yeah, look at what I did and say what you like and what you don't.
>>>>> :).
>>>>>
>>>>> -Gus
>>>>>
>>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Sounds like an interesting adventure last weekend.
>>>>>>
>>>>>> I'm unclear what the point of going this direction is; my instinct is
>>>>>> to go the opposite direction.  You seem to suggest there are some
>>>>>> simplification/organization benefits, which I love, so I'll need to look at
>>>>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>>>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>>>>> generic web app that can be deployed to a container of the user's
>>>>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>>>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>>>>> view Solr as self-contained and more supportable if the project makes these
>>>>>> decisions and thus not needlessly constrain itself as well.
>>>>>>
>>>>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter*
>>>>>> and not a *Servlet* itself.
>>>>>>
>>>>>> Also, I suspect there may be complications in changes here relating
>>>>>> to Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>>>>> JettySolrRunner.
>>>>>>
>>>>>> ~ David Smiley
>>>>>> Apache Lucene/Solr Search Developer
>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>>>>>
>>>>>>> *TLDR:* I've got a working branch
>>>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>>>>> and/or PR with intention to make this change?
>>>>>>>
>>>>>>> *Details:*
>>>>>>>
>>>>>>> Jetty is a servlet container, yet we more or less ignore it by
>>>>>>> stuffing everything into a single filter (almost, admin UI is served
>>>>>>> separately). I'm sure there are lots of historical reasons for this,
>>>>>>> probably including servlet containers and their specs were much less mature
>>>>>>> when solr was first started. Maybe also the early authors were more focused
>>>>>>> on making search work than leveraging what the container could do for them
>>>>>>> (but I wasn't there for that so that's just a guess).
>>>>>>>
>>>>>>> The result is that we have a couple of very large classes that are
>>>>>>> touched by almost every request, and they have a lot of conditional logic
>>>>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>>>>> done by the container based on the request URL to direct it to the
>>>>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>>>>> two classes probably makes them harder to understand.
>>>>>>>
>>>>>>> It seems to me (and this mail is asking if you agree) that these
>>>>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>>>>> pull out is all the admin calls. Admin code paths really have little or
>>>>>>> nothing to do with query or update code paths since there are no documents
>>>>>>> to route or sub-requests to some subset of nodes.
>>>>>>>
>>>>>>> The primary obstacle to any such separation and simplification is
>>>>>>> that most requests have some interaction with CoreContainer, and the things
>>>>>>> it holds, and this is initialized and held by a field in
>>>>>>> SolrDispatchFilter. After spending a significant chunk of time reading this
>>>>>>> code in the prior weeks and a timely and motivating conversation with Eric
>>>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I could
>>>>>>> pull CoreContainer out of the dispatch filter, and leverage the facilities
>>>>>>> of our servlet container.
>>>>>>>
>>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>>>>> very confusing, and worrisome since I've realized it's not respecting our
>>>>>>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>>>>>>> our tests in JettySolrRunner (tangent alert)
>>>>>>>
>>>>>>> The result is that CoreContainer is now held by a class called
>>>>>>> CoreService (please help me name things if you don't like my names :) ).
>>>>>>> CoreService is a ServletContextListener, appropriately configured in
>>>>>>> web.xml, and has a static method that can be used to get a reference to the
>>>>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>>>>> core container is running (this supports having multiple JettySolrRunners
>>>>>>> in tests, though probably never has more than one CoreContainer in the
>>>>>>> running application)
>>>>>>>
>>>>>>> I achieved this in 4 stages shown here:
>>>>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>>>>
>>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>>>>> subtracted without harm.
>>>>>>>
>>>>>>> Since the current state of the code in that branch is apparently
>>>>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>>>>> machine)...
>>>>>>>
>>>>>>> Do we want to push this refactor in now to avoid making a huge ball
>>>>>>> of changes that gets harder and harder to merge? The next push point would
>>>>>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>>>>>> could not push that class for now).
>>>>>>>
>>>>>>> If you read this far, thanks :) I wasn't sure how feasible this
>>>>>>> would be so I felt the need to prove it to my self in code before wasting
>>>>>>> your time, but please don't hesitate to point to costs I might be missing
>>>>>>> or anything that looks like a mistake, or say why this was a total waste of
>>>>>>> time :)
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> -Gus
>>>>>>> --
>>>>>>> http://www.needhamsoftware.com (work)
>>>>>>> http://www.the111shift.com (play)
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://about.me/markrmiller
>>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and thus
lack of metadata-complete=true in web.xml means that we wind up scanning
for annotations?

On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <gu...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/SOLR-15590
>
> On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:
>
>> Ok well it would be interesting to compare quickstarting vs
>> JettySolrRunner, and reading up on quickstart
>> <https://webtide.com/jetty-9-quick-start/> gives ideas about
>> pre-building the quickstart xml, but web.xml for JettySolrRunner is a
>> tangent. Either way we are still in a container and I think I hear some
>> agreement that something should be done about the dispatch here, and both
>> of you seem to agree that an actual servlet would make more sense than a
>> filter, so I'll make a ticket/pr to make it easier to track & easier to
>> read the code.
>>
>> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com>
>> wrote:
>>
>>> The downside to respecting web.xml and making JettySolrServer serve a
>>> webapp is that loading a webapp is very expensive and slow in comparison.
>>> JettySolrServer actually starts up extremely quickly. It’s almost more
>>> appealing to change the Server to use the JettySolrServer strategy. It’s so
>>> slow to load a webapp because of all the things it needs to support and
>>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>>> QuickStart does improve the situation if used at least. But there is really
>>> no need to eat the whole webapp standard for a non webapp app to have Jetty
>>> manage more of the dispatching. I always wondering about the motivation /
>>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>>> in a servlet container, it’s actually a beautiful move that side steps a
>>> bunch of slow crap even better than the QuickStart pushes ever will, and
>>> still allows for matching any support we would want.
>>>
>>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
>>> made a slim SolrCall base class that each extends. All that logic is hairy
>>> enough to follow without them interleaving and sharing in a kind of wild
>>> collaboration. It’s pretty unfortunate they both still exist.
>>>
>>> You are right, leaving that path in that state is a poor idea. If there
>>> was anything that could be considered the “hot” path, it’s right there -
>>> feverishly checking and ripping up streams for anyone and anything. Is a
>>> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
>>> down? Do a dance? Treat the string like a date and see if that works.
>>>
>>> So yeah, agreed, nobody does or would dispatch this way, you have to
>>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good
>>> at holding onto life, even building on it. But being a webapp and parsing
>>> web.xml has little to do with dispatching and having sensible http api that
>>> doesn’t work like it’s a perl compiler.
>>>
>>> Anyway, I can’t imagine trying to trace through that code with any solid
>>> feeling about having a handle on it via inspection. But just like Perl, if
>>> you debug / trace log the hell out of it, it is all actually pretty simple
>>> to adjust and reign in.
>>>
>>>
>>> MRM
>>>
>>>
>>>
>>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>>
>>>> So I'm not interested in *adhering* to anything here, just using stuff
>>>> where it helps... I see tools sitting on the shelf, already bought and paid
>>>> for and carried with us to every job but then they sit in the back of the
>>>> truck mostly unused... and (I think) they look useful.
>>>>
>>>> This should not be seen as in any way advocating a return to war file
>>>> deployment or "choose your container". All of what I would suggest should
>>>> likely be compatible with a jetty startup controlled by our code (I assume
>>>> we can convince it to read web.xml properly when doing so). Certainly point
>>>> out anything that would interfere with that.
>>>>
>>>> Here we sit in a container that is a tool box containing:
>>>>
>>>>    - A nice mechanism for initializing stuff at the application level,
>>>>    (ServletContextListener) and a well defined guarantee that it runs before
>>>>    any filter or servlet,
>>>>    - An automatic shutdown mechanism that it will call for us on
>>>>    graceful shutdown (ServletContextListner again).
>>>>    - A nice layered filter chain mechanism which could have been used
>>>>    to layer in things like tracing and authentication, and close shields etc
>>>>    as small succinct filters rather than weaving them into an ever more
>>>>    complex filter & call class.
>>>>    - In more recent versions of the spec, for listeners defined in
>>>>    web.xml the order is also guaranteed.
>>>>    - Servlet classes that are *already* set up to distinguish http
>>>>    verbs automatically when desired
>>>>
>>>> So why is this better? Because monster classes that do a hundred things
>>>> are really hard to understand and maintain. Small methods, small classes
>>>> whenever possible. I also suspect that there may be some gains in
>>>> performance to be had if we rely more on the container (which will already
>>>> be dispatching based on path) to choose our code paths (at least at a
>>>> coarse level) and then have less custom dispatch logic executed on *every*
>>>> request
>>>>
>>>> Obviously I'm wrong and if the net result is less performant to any
>>>> significant degree forget it that wouldn't be worth it. (wanted:
>>>> standardized solr benchmarks)
>>>>
>>>> There WILL be complications with v2 because it is a subclass of
>>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>>>> should be a separate servlet from v1, but we don't want to duplicate code
>>>> either... so work to do there...
>>>>
>>>> I think an incremental approach is necessary since very few of us have
>>>> the bandwidth to more than that and certainly it becomes difficult to find
>>>> anyone with the time to review large changes. What I have thus far is
>>>> stable with respect to the tests, and simplifies some stuff already which
>>>> is why I chose this point to start the discussion
>>>>
>>>> But yeah, look at what I did and say what you like and what you don't.
>>>> :).
>>>>
>>>> -Gus
>>>>
>>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org>
>>>> wrote:
>>>>
>>>>> Sounds like an interesting adventure last weekend.
>>>>>
>>>>> I'm unclear what the point of going this direction is; my instinct is
>>>>> to go the opposite direction.  You seem to suggest there are some
>>>>> simplification/organization benefits, which I love, so I'll need to look at
>>>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>>>> generic web app that can be deployed to a container of the user's
>>>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>>>> view Solr as self-contained and more supportable if the project makes these
>>>>> decisions and thus not needlessly constrain itself as well.
>>>>>
>>>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter*
>>>>> and not a *Servlet* itself.
>>>>>
>>>>> Also, I suspect there may be complications in changes here relating to
>>>>> Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>>>> JettySolrRunner.
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>>
>>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>>>>
>>>>>> *TLDR:* I've got a working branch
>>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>>>> and/or PR with intention to make this change?
>>>>>>
>>>>>> *Details:*
>>>>>>
>>>>>> Jetty is a servlet container, yet we more or less ignore it by
>>>>>> stuffing everything into a single filter (almost, admin UI is served
>>>>>> separately). I'm sure there are lots of historical reasons for this,
>>>>>> probably including servlet containers and their specs were much less mature
>>>>>> when solr was first started. Maybe also the early authors were more focused
>>>>>> on making search work than leveraging what the container could do for them
>>>>>> (but I wasn't there for that so that's just a guess).
>>>>>>
>>>>>> The result is that we have a couple of very large classes that are
>>>>>> touched by almost every request, and they have a lot of conditional logic
>>>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>>>> done by the container based on the request URL to direct it to the
>>>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>>>> two classes probably makes them harder to understand.
>>>>>>
>>>>>> It seems to me (and this mail is asking if you agree) that these
>>>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>>>> pull out is all the admin calls. Admin code paths really have little or
>>>>>> nothing to do with query or update code paths since there are no documents
>>>>>> to route or sub-requests to some subset of nodes.
>>>>>>
>>>>>> The primary obstacle to any such separation and simplification is
>>>>>> that most requests have some interaction with CoreContainer, and the things
>>>>>> it holds, and this is initialized and held by a field in
>>>>>> SolrDispatchFilter. After spending a significant chunk of time reading this
>>>>>> code in the prior weeks and a timely and motivating conversation with Eric
>>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I could
>>>>>> pull CoreContainer out of the dispatch filter, and leverage the facilities
>>>>>> of our servlet container.
>>>>>>
>>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>>>> very confusing, and worrisome since I've realized it's not respecting our
>>>>>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>>>>>> our tests in JettySolrRunner (tangent alert)
>>>>>>
>>>>>> The result is that CoreContainer is now held by a class called
>>>>>> CoreService (please help me name things if you don't like my names :) ).
>>>>>> CoreService is a ServletContextListener, appropriately configured in
>>>>>> web.xml, and has a static method that can be used to get a reference to the
>>>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>>>> core container is running (this supports having multiple JettySolrRunners
>>>>>> in tests, though probably never has more than one CoreContainer in the
>>>>>> running application)
>>>>>>
>>>>>> I achieved this in 4 stages shown here:
>>>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>>>
>>>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>>>> subtracted without harm.
>>>>>>
>>>>>> Since the current state of the code in that branch is apparently
>>>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>>>> machine)...
>>>>>>
>>>>>> Do we want to push this refactor in now to avoid making a huge ball
>>>>>> of changes that gets harder and harder to merge? The next push point would
>>>>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>>>>> could not push that class for now).
>>>>>>
>>>>>> If you read this far, thanks :) I wasn't sure how feasible this would
>>>>>> be so I felt the need to prove it to my self in code before wasting your
>>>>>> time, but please don't hesitate to point to costs I might be missing or
>>>>>> anything that looks like a mistake, or say why this was a total waste of
>>>>>> time :)
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> -Gus
>>>>>> --
>>>>>> http://www.needhamsoftware.com (work)
>>>>>> http://www.the111shift.com (play)
>>>>>>
>>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>> --
>>> - Mark
>>>
>>> http://about.me/markrmiller
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
https://issues.apache.org/jira/browse/SOLR-15590

On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <gu...@gmail.com> wrote:

> Ok well it would be interesting to compare quickstarting vs
> JettySolrRunner, and reading up on quickstart
> <https://webtide.com/jetty-9-quick-start/> gives ideas about pre-building
> the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either
> way we are still in a container and I think I hear some agreement that
> something should be done about the dispatch here, and both of you seem to
> agree that an actual servlet would make more sense than a filter, so I'll
> make a ticket/pr to make it easier to track & easier to read the code.
>
> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com> wrote:
>
>> The downside to respecting web.xml and making JettySolrServer serve a
>> webapp is that loading a webapp is very expensive and slow in comparison.
>> JettySolrServer actually starts up extremely quickly. It’s almost more
>> appealing to change the Server to use the JettySolrServer strategy. It’s so
>> slow to load a webapp because of all the things it needs to support and
>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
>> QuickStart does improve the situation if used at least. But there is really
>> no need to eat the whole webapp standard for a non webapp app to have Jetty
>> manage more of the dispatching. I always wondering about the motivation /
>> upside of hacking in a straight servlet myself. But for a non webapp stuck
>> in a servlet container, it’s actually a beautiful move that side steps a
>> bunch of slow crap even better than the QuickStart pushes ever will, and
>> still allows for matching any support we would want.
>>
>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
>> made a slim SolrCall base class that each extends. All that logic is hairy
>> enough to follow without them interleaving and sharing in a kind of wild
>> collaboration. It’s pretty unfortunate they both still exist.
>>
>> You are right, leaving that path in that state is a poor idea. If there
>> was anything that could be considered the “hot” path, it’s right there -
>> feverishly checking and ripping up streams for anyone and anything. Is a
>> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
>> down? Do a dance? Treat the string like a date and see if that works.
>>
>> So yeah, agreed, nobody does or would dispatch this way, you have to frog
>> boil into it. It’s slow, almost incomprehensible, and incredibly good at
>> holding onto life, even building on it. But being a webapp and parsing
>> web.xml has little to do with dispatching and having sensible http api that
>> doesn’t work like it’s a perl compiler.
>>
>> Anyway, I can’t imagine trying to trace through that code with any solid
>> feeling about having a handle on it via inspection. But just like Perl, if
>> you debug / trace log the hell out of it, it is all actually pretty simple
>> to adjust and reign in.
>>
>>
>> MRM
>>
>>
>>
>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>>
>>> So I'm not interested in *adhering* to anything here, just using stuff
>>> where it helps... I see tools sitting on the shelf, already bought and paid
>>> for and carried with us to every job but then they sit in the back of the
>>> truck mostly unused... and (I think) they look useful.
>>>
>>> This should not be seen as in any way advocating a return to war file
>>> deployment or "choose your container". All of what I would suggest should
>>> likely be compatible with a jetty startup controlled by our code (I assume
>>> we can convince it to read web.xml properly when doing so). Certainly point
>>> out anything that would interfere with that.
>>>
>>> Here we sit in a container that is a tool box containing:
>>>
>>>    - A nice mechanism for initializing stuff at the application level,
>>>    (ServletContextListener) and a well defined guarantee that it runs before
>>>    any filter or servlet,
>>>    - An automatic shutdown mechanism that it will call for us on
>>>    graceful shutdown (ServletContextListner again).
>>>    - A nice layered filter chain mechanism which could have been used
>>>    to layer in things like tracing and authentication, and close shields etc
>>>    as small succinct filters rather than weaving them into an ever more
>>>    complex filter & call class.
>>>    - In more recent versions of the spec, for listeners defined in
>>>    web.xml the order is also guaranteed.
>>>    - Servlet classes that are *already* set up to distinguish http
>>>    verbs automatically when desired
>>>
>>> So why is this better? Because monster classes that do a hundred things
>>> are really hard to understand and maintain. Small methods, small classes
>>> whenever possible. I also suspect that there may be some gains in
>>> performance to be had if we rely more on the container (which will already
>>> be dispatching based on path) to choose our code paths (at least at a
>>> coarse level) and then have less custom dispatch logic executed on *every*
>>> request
>>>
>>> Obviously I'm wrong and if the net result is less performant to any
>>> significant degree forget it that wouldn't be worth it. (wanted:
>>> standardized solr benchmarks)
>>>
>>> There WILL be complications with v2 because it is a subclass of
>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>>> should be a separate servlet from v1, but we don't want to duplicate code
>>> either... so work to do there...
>>>
>>> I think an incremental approach is necessary since very few of us have
>>> the bandwidth to more than that and certainly it becomes difficult to find
>>> anyone with the time to review large changes. What I have thus far is
>>> stable with respect to the tests, and simplifies some stuff already which
>>> is why I chose this point to start the discussion
>>>
>>> But yeah, look at what I did and say what you like and what you don't.
>>> :).
>>>
>>> -Gus
>>>
>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>>>
>>>> Sounds like an interesting adventure last weekend.
>>>>
>>>> I'm unclear what the point of going this direction is; my instinct is
>>>> to go the opposite direction.  You seem to suggest there are some
>>>> simplification/organization benefits, which I love, so I'll need to look at
>>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>>> generic web app that can be deployed to a container of the user's
>>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>>> view Solr as self-contained and more supportable if the project makes these
>>>> decisions and thus not needlessly constrain itself as well.
>>>>
>>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter*
>>>> and not a *Servlet* itself.
>>>>
>>>> Also, I suspect there may be complications in changes here relating to
>>>> Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>>> JettySolrRunner.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>>>
>>>>> *TLDR:* I've got a working branch
>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>>> and/or PR with intention to make this change?
>>>>>
>>>>> *Details:*
>>>>>
>>>>> Jetty is a servlet container, yet we more or less ignore it by
>>>>> stuffing everything into a single filter (almost, admin UI is served
>>>>> separately). I'm sure there are lots of historical reasons for this,
>>>>> probably including servlet containers and their specs were much less mature
>>>>> when solr was first started. Maybe also the early authors were more focused
>>>>> on making search work than leveraging what the container could do for them
>>>>> (but I wasn't there for that so that's just a guess).
>>>>>
>>>>> The result is that we have a couple of very large classes that are
>>>>> touched by almost every request, and they have a lot of conditional logic
>>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>>> done by the container based on the request URL to direct it to the
>>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>>> two classes probably makes them harder to understand.
>>>>>
>>>>> It seems to me (and this mail is asking if you agree) that these
>>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>>> pull out is all the admin calls. Admin code paths really have little or
>>>>> nothing to do with query or update code paths since there are no documents
>>>>> to route or sub-requests to some subset of nodes.
>>>>>
>>>>> The primary obstacle to any such separation and simplification is that
>>>>> most requests have some interaction with CoreContainer, and the things it
>>>>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>>>>> After spending a significant chunk of time reading this code in the prior
>>>>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>>>>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>>>>> out of the dispatch filter, and leverage the facilities of our servlet
>>>>> container.
>>>>>
>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>>> very confusing, and worrisome since I've realized it's not respecting our
>>>>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>>>>> our tests in JettySolrRunner (tangent alert)
>>>>>
>>>>> The result is that CoreContainer is now held by a class called
>>>>> CoreService (please help me name things if you don't like my names :) ).
>>>>> CoreService is a ServletContextListener, appropriately configured in
>>>>> web.xml, and has a static method that can be used to get a reference to the
>>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>>> core container is running (this supports having multiple JettySolrRunners
>>>>> in tests, though probably never has more than one CoreContainer in the
>>>>> running application)
>>>>>
>>>>> I achieved this in 4 stages shown here:
>>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>>
>>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>>> subtracted without harm.
>>>>>
>>>>> Since the current state of the code in that branch is apparently
>>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>>> machine)...
>>>>>
>>>>> Do we want to push this refactor in now to avoid making a huge ball of
>>>>> changes that gets harder and harder to merge? The next push point would
>>>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>>>> could not push that class for now).
>>>>>
>>>>> If you read this far, thanks :) I wasn't sure how feasible this would
>>>>> be so I felt the need to prove it to my self in code before wasting your
>>>>> time, but please don't hesitate to point to costs I might be missing or
>>>>> anything that looks like a mistake, or say why this was a total waste of
>>>>> time :)
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> -Gus
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
Ok well it would be interesting to compare quickstarting vs
JettySolrRunner, and reading up on quickstart
<https://webtide.com/jetty-9-quick-start/> gives ideas about pre-building
the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either
way we are still in a container and I think I hear some agreement that
something should be done about the dispatch here, and both of you seem to
agree that an actual servlet would make more sense than a filter, so I'll
make a ticket/pr to make it easier to track & easier to read the code.

On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <ma...@gmail.com> wrote:

> The downside to respecting web.xml and making JettySolrServer serve a
> webapp is that loading a webapp is very expensive and slow in comparison.
> JettySolrServer actually starts up extremely quickly. It’s almost more
> appealing to change the Server to use the JettySolrServer strategy. It’s so
> slow to load a webapp because of all the things it needs to support and
> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
> QuickStart does improve the situation if used at least. But there is really
> no need to eat the whole webapp standard for a non webapp app to have Jetty
> manage more of the dispatching. I always wondering about the motivation /
> upside of hacking in a straight servlet myself. But for a non webapp stuck
> in a servlet container, it’s actually a beautiful move that side steps a
> bunch of slow crap even better than the QuickStart pushes ever will, and
> still allows for matching any support we would want.
>
> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
> made a slim SolrCall base class that each extends. All that logic is hairy
> enough to follow without them interleaving and sharing in a kind of wild
> collaboration. It’s pretty unfortunate they both still exist.
>
> You are right, leaving that path in that state is a poor idea. If there
> was anything that could be considered the “hot” path, it’s right there -
> feverishly checking and ripping up streams for anyone and anything. Is a
> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
> down? Do a dance? Treat the string like a date and see if that works.
>
> So yeah, agreed, nobody does or would dispatch this way, you have to frog
> boil into it. It’s slow, almost incomprehensible, and incredibly good at
> holding onto life, even building on it. But being a webapp and parsing
> web.xml has little to do with dispatching and having sensible http api that
> doesn’t work like it’s a perl compiler.
>
> Anyway, I can’t imagine trying to trace through that code with any solid
> feeling about having a handle on it via inspection. But just like Perl, if
> you debug / trace log the hell out of it, it is all actually pretty simple
> to adjust and reign in.
>
>
> MRM
>
>
>
> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:
>
>> So I'm not interested in *adhering* to anything here, just using stuff
>> where it helps... I see tools sitting on the shelf, already bought and paid
>> for and carried with us to every job but then they sit in the back of the
>> truck mostly unused... and (I think) they look useful.
>>
>> This should not be seen as in any way advocating a return to war file
>> deployment or "choose your container". All of what I would suggest should
>> likely be compatible with a jetty startup controlled by our code (I assume
>> we can convince it to read web.xml properly when doing so). Certainly point
>> out anything that would interfere with that.
>>
>> Here we sit in a container that is a tool box containing:
>>
>>    - A nice mechanism for initializing stuff at the application level,
>>    (ServletContextListener) and a well defined guarantee that it runs before
>>    any filter or servlet,
>>    - An automatic shutdown mechanism that it will call for us on
>>    graceful shutdown (ServletContextListner again).
>>    - A nice layered filter chain mechanism which could have been used to
>>    layer in things like tracing and authentication, and close shields etc as
>>    small succinct filters rather than weaving them into an ever more complex
>>    filter & call class.
>>    - In more recent versions of the spec, for listeners defined in
>>    web.xml the order is also guaranteed.
>>    - Servlet classes that are *already* set up to distinguish http verbs
>>    automatically when desired
>>
>> So why is this better? Because monster classes that do a hundred things
>> are really hard to understand and maintain. Small methods, small classes
>> whenever possible. I also suspect that there may be some gains in
>> performance to be had if we rely more on the container (which will already
>> be dispatching based on path) to choose our code paths (at least at a
>> coarse level) and then have less custom dispatch logic executed on *every*
>> request
>>
>> Obviously I'm wrong and if the net result is less performant to any
>> significant degree forget it that wouldn't be worth it. (wanted:
>> standardized solr benchmarks)
>>
>> There WILL be complications with v2 because it is a subclass of
>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>> should be a separate servlet from v1, but we don't want to duplicate code
>> either... so work to do there...
>>
>> I think an incremental approach is necessary since very few of us have
>> the bandwidth to more than that and certainly it becomes difficult to find
>> anyone with the time to review large changes. What I have thus far is
>> stable with respect to the tests, and simplifies some stuff already which
>> is why I chose this point to start the discussion
>>
>> But yeah, look at what I did and say what you like and what you don't.
>> :).
>>
>> -Gus
>>
>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>>
>>> Sounds like an interesting adventure last weekend.
>>>
>>> I'm unclear what the point of going this direction is; my instinct is to
>>> go the opposite direction.  You seem to suggest there are some
>>> simplification/organization benefits, which I love, so I'll need to look at
>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>> generic web app that can be deployed to a container of the user's
>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>> view Solr as self-contained and more supportable if the project makes these
>>> decisions and thus not needlessly constrain itself as well.
>>>
>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter*
>>> and not a *Servlet* itself.
>>>
>>> Also, I suspect there may be complications in changes here relating to
>>> Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>> JettySolrRunner.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>>
>>>> *TLDR:* I've got a working branch
>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>> and/or PR with intention to make this change?
>>>>
>>>> *Details:*
>>>>
>>>> Jetty is a servlet container, yet we more or less ignore it by stuffing
>>>> everything into a single filter (almost, admin UI is served separately).
>>>> I'm sure there are lots of historical reasons for this, probably including
>>>> servlet containers and their specs were much less mature when solr was
>>>> first started. Maybe also the early authors were more focused on making
>>>> search work than leveraging what the container could do for them (but I
>>>> wasn't there for that so that's just a guess).
>>>>
>>>> The result is that we have a couple of very large classes that are
>>>> touched by almost every request, and they have a lot of conditional logic
>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>> done by the container based on the request URL to direct it to the
>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>> two classes probably makes them harder to understand.
>>>>
>>>> It seems to me (and this mail is asking if you agree) that these
>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>> pull out is all the admin calls. Admin code paths really have little or
>>>> nothing to do with query or update code paths since there are no documents
>>>> to route or sub-requests to some subset of nodes.
>>>>
>>>> The primary obstacle to any such separation and simplification is that
>>>> most requests have some interaction with CoreContainer, and the things it
>>>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>>>> After spending a significant chunk of time reading this code in the prior
>>>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>>>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>>>> out of the dispatch filter, and leverage the facilities of our servlet
>>>> container.
>>>>
>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>> very confusing, and worrisome since I've realized it's not respecting our
>>>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>>>> our tests in JettySolrRunner (tangent alert)
>>>>
>>>> The result is that CoreContainer is now held by a class called
>>>> CoreService (please help me name things if you don't like my names :) ).
>>>> CoreService is a ServletContextListener, appropriately configured in
>>>> web.xml, and has a static method that can be used to get a reference to the
>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>> core container is running (this supports having multiple JettySolrRunners
>>>> in tests, though probably never has more than one CoreContainer in the
>>>> running application)
>>>>
>>>> I achieved this in 4 stages shown here:
>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>
>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>> subtracted without harm.
>>>>
>>>> Since the current state of the code in that branch is apparently
>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>> machine)...
>>>>
>>>> Do we want to push this refactor in now to avoid making a huge ball of
>>>> changes that gets harder and harder to merge? The next push point would
>>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>>> could not push that class for now).
>>>>
>>>> If you read this far, thanks :) I wasn't sure how feasible this would
>>>> be so I felt the need to prove it to my self in code before wasting your
>>>> time, but please don't hesitate to point to costs I might be missing or
>>>> anything that looks like a mistake, or say why this was a total waste of
>>>> time :)
>>>>
>>>> Thoughts?
>>>>
>>>> -Gus
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
> --
> - Mark
>
> http://about.me/markrmiller
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Mark Miller <ma...@gmail.com>.
The downside to respecting web.xml and making JettySolrServer serve a
webapp is that loading a webapp is very expensive and slow in comparison.
JettySolrServer actually starts up extremely quickly. It’s almost more
appealing to change the Server to use the JettySolrServer strategy. It’s so
slow to load a webapp because of all the things it needs to support and
scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
QuickStart does improve the situation if used at least. But there is really
no need to eat the whole webapp standard for a non webapp app to have Jetty
manage more of the dispatching. I always wondering about the motivation /
upside of hacking in a straight servlet myself. But for a non webapp stuck
in a servlet container, it’s actually a beautiful move that side steps a
bunch of slow crap even better than the QuickStart pushes ever will, and
still allows for matching any support we would want.

The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I made
a slim SolrCall base class that each extends. All that logic is hairy
enough to follow without them interleaving and sharing in a kind of wild
collaboration. It’s pretty unfortunate they both still exist.

You are right, leaving that path in that state is a poor idea. If there was
anything that could be considered the “hot” path, it’s right there -
feverishly checking and ripping up streams for anyone and anything. Is a
core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
down? Do a dance? Treat the string like a date and see if that works.

So yeah, agreed, nobody does or would dispatch this way, you have to frog
boil into it. It’s slow, almost incomprehensible, and incredibly good at
holding onto life, even building on it. But being a webapp and parsing
web.xml has little to do with dispatching and having sensible http api that
doesn’t work like it’s a perl compiler.

Anyway, I can’t imagine trying to trace through that code with any solid
feeling about having a handle on it via inspection. But just like Perl, if
you debug / trace log the hell out of it, it is all actually pretty simple
to adjust and reign in.


MRM



On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <gu...@gmail.com> wrote:

> So I'm not interested in *adhering* to anything here, just using stuff
> where it helps... I see tools sitting on the shelf, already bought and paid
> for and carried with us to every job but then they sit in the back of the
> truck mostly unused... and (I think) they look useful.
>
> This should not be seen as in any way advocating a return to war file
> deployment or "choose your container". All of what I would suggest should
> likely be compatible with a jetty startup controlled by our code (I assume
> we can convince it to read web.xml properly when doing so). Certainly point
> out anything that would interfere with that.
>
> Here we sit in a container that is a tool box containing:
>
>    - A nice mechanism for initializing stuff at the application level,
>    (ServletContextListener) and a well defined guarantee that it runs before
>    any filter or servlet,
>    - An automatic shutdown mechanism that it will call for us on graceful
>    shutdown (ServletContextListner again).
>    - A nice layered filter chain mechanism which could have been used to
>    layer in things like tracing and authentication, and close shields etc as
>    small succinct filters rather than weaving them into an ever more complex
>    filter & call class.
>    - In more recent versions of the spec, for listeners defined in
>    web.xml the order is also guaranteed.
>    - Servlet classes that are *already* set up to distinguish http verbs
>    automatically when desired
>
> So why is this better? Because monster classes that do a hundred things
> are really hard to understand and maintain. Small methods, small classes
> whenever possible. I also suspect that there may be some gains in
> performance to be had if we rely more on the container (which will already
> be dispatching based on path) to choose our code paths (at least at a
> coarse level) and then have less custom dispatch logic executed on *every*
> request
>
> Obviously I'm wrong and if the net result is less performant to any
> significant degree forget it that wouldn't be worth it. (wanted:
> standardized solr benchmarks)
>
> There WILL be complications with v2 because it is a subclass of
> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
> should be a separate servlet from v1, but we don't want to duplicate code
> either... so work to do there...
>
> I think an incremental approach is necessary since very few of us have the
> bandwidth to more than that and certainly it becomes difficult to find
> anyone with the time to review large changes. What I have thus far is
> stable with respect to the tests, and simplifies some stuff already which
> is why I chose this point to start the discussion
>
> But yeah, look at what I did and say what you like and what you don't. :).
>
> -Gus
>
> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:
>
>> Sounds like an interesting adventure last weekend.
>>
>> I'm unclear what the point of going this direction is; my instinct is to
>> go the opposite direction.  You seem to suggest there are some
>> simplification/organization benefits, which I love, so I'll need to look at
>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>> spec but we need not embrace it.  Adhering to that is useful if you have a
>> generic web app that can be deployed to a container of the user's
>> convenience/choosing.  No doubt this is why Solr started this way, and why
>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>> view Solr as self-contained and more supportable if the project makes these
>> decisions and thus not needlessly constrain itself as well.
>>
>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and
>> not a *Servlet* itself.
>>
>> Also, I suspect there may be complications in changes here relating to
>> Solr's v1 vs v2 API.  And most definitely also what you discovered --
>> JettySolrRunner.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>>
>>> *TLDR:* I've got a working branch
>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer
>>> & our startup process is extracted from SolrDispatch Filter. Do other folks
>>> think that is interesting enough that I should make a JIRA and/or PR with
>>> intention to make this change?
>>>
>>> *Details:*
>>>
>>> Jetty is a servlet container, yet we more or less ignore it by stuffing
>>> everything into a single filter (almost, admin UI is served separately).
>>> I'm sure there are lots of historical reasons for this, probably including
>>> servlet containers and their specs were much less mature when solr was
>>> first started. Maybe also the early authors were more focused on making
>>> search work than leveraging what the container could do for them (but I
>>> wasn't there for that so that's just a guess).
>>>
>>> The result is that we have a couple of very large classes that are
>>> touched by almost every request, and they have a lot of conditional logic
>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>> done by the container based on the request URL to direct it to the
>>> appropriate servlet. Solr does a LOT of different things so this code is
>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>> two classes probably makes them harder to understand.
>>>
>>> It seems to me (and this mail is asking if you agree) that these classes
>>> are long overdue for some subdivision. The most obvious thing to pull out
>>> is all the admin calls. Admin code paths really have little or nothing to
>>> do with query or update code paths since there are no documents to route or
>>> sub-requests to some subset of nodes.
>>>
>>> The primary obstacle to any such separation and simplification is that
>>> most requests have some interaction with CoreContainer, and the things it
>>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>>> After spending a significant chunk of time reading this code in the prior
>>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>>> out of the dispatch filter, and leverage the facilities of our servlet
>>> container.
>>>
>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>> very confusing, and worrisome since I've realized it's not respecting our
>>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>>> our tests in JettySolrRunner (tangent alert)
>>>
>>> The result is that CoreContainer is now held by a class called
>>> CoreService (please help me name things if you don't like my names :) ).
>>> CoreService is a ServletContextListener, appropriately configured in
>>> web.xml, and has a static method that can be used to get a reference to the
>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>> core container is running (this supports having multiple JettySolrRunners
>>> in tests, though probably never has more than one CoreContainer in the
>>> running application)
>>>
>>> I achieved this in 4 stages shown here:
>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>
>>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted
>>> without harm.
>>>
>>> Since the current state of the code in that branch is apparently
>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>> machine)...
>>>
>>> Do we want to push this refactor in now to avoid making a huge ball of
>>> changes that gets harder and harder to merge? The next push point would
>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>> could not push that class for now).
>>>
>>> If you read this far, thanks :) I wasn't sure how feasible this would be
>>> so I felt the need to prove it to my self in code before wasting your time,
>>> but please don't hesitate to point to costs I might be missing or anything
>>> that looks like a mistake, or say why this was a total waste of time :)
>>>
>>> Thoughts?
>>>
>>> -Gus
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
-- 
- Mark

http://about.me/markrmiller

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
So I'm not interested in *adhering* to anything here, just using stuff
where it helps... I see tools sitting on the shelf, already bought and paid
for and carried with us to every job but then they sit in the back of the
truck mostly unused... and (I think) they look useful.

This should not be seen as in any way advocating a return to war file
deployment or "choose your container". All of what I would suggest should
likely be compatible with a jetty startup controlled by our code (I assume
we can convince it to read web.xml properly when doing so). Certainly point
out anything that would interfere with that.

Here we sit in a container that is a tool box containing:

   - A nice mechanism for initializing stuff at the application level,
   (ServletContextListener) and a well defined guarantee that it runs before
   any filter or servlet,
   - An automatic shutdown mechanism that it will call for us on graceful
   shutdown (ServletContextListner again).
   - A nice layered filter chain mechanism which could have been used to
   layer in things like tracing and authentication, and close shields etc as
   small succinct filters rather than weaving them into an ever more complex
   filter & call class.
   - In more recent versions of the spec, for listeners defined in web.xml
   the order is also guaranteed.
   - Servlet classes that are *already* set up to distinguish http verbs
   automatically when desired

So why is this better? Because monster classes that do a hundred things are
really hard to understand and maintain. Small methods, small classes
whenever possible. I also suspect that there may be some gains in
performance to be had if we rely more on the container (which will already
be dispatching based on path) to choose our code paths (at least at a
coarse level) and then have less custom dispatch logic executed on *every*
request

Obviously I'm wrong and if the net result is less performant to any
significant degree forget it that wouldn't be worth it. (wanted:
standardized solr benchmarks)

There WILL be complications with v2 because it is a subclass of
HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
should be a separate servlet from v1, but we don't want to duplicate code
either... so work to do there...

I think an incremental approach is necessary since very few of us have the
bandwidth to more than that and certainly it becomes difficult to find
anyone with the time to review large changes. What I have thus far is
stable with respect to the tests, and simplifies some stuff already which
is why I chose this point to start the discussion

But yeah, look at what I did and say what you like and what you don't. :).

-Gus

On Mon, Aug 16, 2021 at 9:42 PM David Smiley <ds...@apache.org> wrote:

> Sounds like an interesting adventure last weekend.
>
> I'm unclear what the point of going this direction is; my instinct is to
> go the opposite direction.  You seem to suggest there are some
> simplification/organization benefits, which I love, so I'll need to look at
> what you've done to judge that for myself.  Yes Jetty supports the Servlet
> spec but we need not embrace it.  Adhering to that is useful if you have a
> generic web app that can be deployed to a container of the user's
> convenience/choosing.  No doubt this is why Solr started this way, and why
> the apps I built in my early days adhered to that spec.  But since 6.0, we
> view Solr as self-contained and more supportable if the project makes these
> decisions and thus not needlessly constrain itself as well.
>
> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and
> not a *Servlet* itself.
>
> Also, I suspect there may be complications in changes here relating to
> Solr's v1 vs v2 API.  And most definitely also what you discovered --
> JettySolrRunner.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:
>
>> *TLDR:* I've got a working branch
>> <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer
>> & our startup process is extracted from SolrDispatch Filter. Do other folks
>> think that is interesting enough that I should make a JIRA and/or PR with
>> intention to make this change?
>>
>> *Details:*
>>
>> Jetty is a servlet container, yet we more or less ignore it by stuffing
>> everything into a single filter (almost, admin UI is served separately).
>> I'm sure there are lots of historical reasons for this, probably including
>> servlet containers and their specs were much less mature when solr was
>> first started. Maybe also the early authors were more focused on making
>> search work than leveraging what the container could do for them (but I
>> wasn't there for that so that's just a guess).
>>
>> The result is that we have a couple of very large classes that are
>> touched by almost every request, and they have a lot of conditional logic
>> trying to decide what the user is asking. Normally this sort of dispatch is
>> done by the container based on the request URL to direct it to the
>> appropriate servlet. Solr does a LOT of different things so this code is
>> extremely tricky and complex to understand. Specifically, I'm speaking of
>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>> two classes probably makes them harder to understand.
>>
>> It seems to me (and this mail is asking if you agree) that these classes
>> are long overdue for some subdivision. The most obvious thing to pull out
>> is all the admin calls. Admin code paths really have little or nothing to
>> do with query or update code paths since there are no documents to route or
>> sub-requests to some subset of nodes.
>>
>> The primary obstacle to any such separation and simplification is that
>> most requests have some interaction with CoreContainer, and the things it
>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>> After spending a significant chunk of time reading this code in the prior
>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>> out of the dispatch filter, and leverage the facilities of our servlet
>> container.
>>
>> That wasn't too terribly hard, but keeping JettySolrRunner happy was very
>> confusing, and worrisome since I've realized it's not respecting our
>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>> our tests in JettySolrRunner (tangent alert)
>>
>> The result is that CoreContainer is now held by a class called
>> CoreService (please help me name things if you don't like my names :) ).
>> CoreService is a ServletContextListener, appropriately configured in
>> web.xml, and has a static method that can be used to get a reference to the
>> CoreContainer corresponding to the ServletContext in which code wanting a
>> core container is running (this supports having multiple JettySolrRunners
>> in tests, though probably never has more than one CoreContainer in the
>> running application)
>>
>> I achieved this in 4 stages shown here:
>> https://github.com/gus-asf/solr/tree/servlet-solr
>>
>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted
>> without harm.
>>
>> Since the current state of the code in that branch is apparently
>> test-stable (4 runs of check in a row passing, none slower than any run of
>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>> machine)...
>>
>> Do we want to push this refactor in now to avoid making a huge ball of
>> changes that gets harder and harder to merge? The next push point would
>> probably be when AdminServlet was functional (if/when that happens) (and we
>> could not push that class for now).
>>
>> If you read this far, thanks :) I wasn't sure how feasible this would be
>> so I felt the need to prove it to my self in code before wasting your time,
>> but please don't hesitate to point to costs I might be missing or anything
>> that looks like a mistake, or say why this was a total waste of time :)
>>
>> Thoughts?
>>
>> -Gus
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by David Smiley <ds...@apache.org>.
Sounds like an interesting adventure last weekend.

I'm unclear what the point of going this direction is; my instinct is to go
the opposite direction.  You seem to suggest there are some
simplification/organization benefits, which I love, so I'll need to look at
what you've done to judge that for myself.  Yes Jetty supports the Servlet
spec but we need not embrace it.  Adhering to that is useful if you have a
generic web app that can be deployed to a container of the user's
convenience/choosing.  No doubt this is why Solr started this way, and why
the apps I built in my early days adhered to that spec.  But since 6.0, we
view Solr as self-contained and more supportable if the project makes these
decisions and thus not needlessly constrain itself as well.

It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and
not a *Servlet* itself.

Also, I suspect there may be complications in changes here relating to
Solr's v1 vs v2 API.  And most definitely also what you discovered --
JettySolrRunner.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <gu...@gmail.com> wrote:

> *TLDR:* I've got a working branch
> <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer &
> our startup process is extracted from SolrDispatch Filter. Do other folks
> think that is interesting enough that I should make a JIRA and/or PR with
> intention to make this change?
>
> *Details:*
>
> Jetty is a servlet container, yet we more or less ignore it by stuffing
> everything into a single filter (almost, admin UI is served separately).
> I'm sure there are lots of historical reasons for this, probably including
> servlet containers and their specs were much less mature when solr was
> first started. Maybe also the early authors were more focused on making
> search work than leveraging what the container could do for them (but I
> wasn't there for that so that's just a guess).
>
> The result is that we have a couple of very large classes that are touched
> by almost every request, and they have a lot of conditional logic trying to
> decide what the user is asking. Normally this sort of dispatch is done by
> the container based on the request URL to direct it to the appropriate
> servlet. Solr does a LOT of different things so this code is extremely
> tricky and complex to understand. Specifically, I'm speaking of
> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
> two classes probably makes them harder to understand.
>
> It seems to me (and this mail is asking if you agree) that these classes
> are long overdue for some subdivision. The most obvious thing to pull out
> is all the admin calls. Admin code paths really have little or nothing to
> do with query or update code paths since there are no documents to route or
> sub-requests to some subset of nodes.
>
> The primary obstacle to any such separation and simplification is that
> most requests have some interaction with CoreContainer, and the things it
> holds, and this is initialized and held by a field in SolrDispatchFilter.
> After spending a significant chunk of time reading this code in the prior
> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
> chunk of my weekend into an experiment to see if I could pull CoreContainer
> out of the dispatch filter, and leverage the facilities of our servlet
> container.
>
> That wasn't too terribly hard, but keeping JettySolrRunner happy was very
> confusing, and worrisome since I've realized it's not respecting our
> web.xml at all, and any configuration in web.xml needs to be duplicated for
> our tests in JettySolrRunner (tangent alert)
>
> The result is that CoreContainer is now held by a class called CoreService
> (please help me name things if you don't like my names :) ). CoreService is
> a ServletContextListener, appropriately configured in web.xml, and has a
> static method that can be used to get a reference to the CoreContainer
> corresponding to the ServletContext in which code wanting a core container
> is running (this supports having multiple JettySolrRunners in tests, though
> probably never has more than one CoreContainer in the running application)
>
> I achieved this in 4 stages shown here:
> https://github.com/gus-asf/solr/tree/servlet-solr
>
> Ignore the AdminServlet class, it's a placeholder, and can be subtracted
> without harm.
>
> Since the current state of the code in that branch is apparently
> test-stable (4 runs of check in a row passing, none slower than any run of
> 3 runs of main, both as low as 10.5 min if I don't continue working on the
> machine)...
>
> Do we want to push this refactor in now to avoid making a huge ball of
> changes that gets harder and harder to merge? The next push point would
> probably be when AdminServlet was functional (if/when that happens) (and we
> could not push that class for now).
>
> If you read this far, thanks :) I wasn't sure how feasible this would be
> so I felt the need to prove it to my self in code before wasting your time,
> but please don't hesitate to point to costs I might be missing or anything
> that looks like a mistake, or say why this was a total waste of time :)
>
> Thoughts?
>
> -Gus
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
Would be nicer if I could go through them and record a decision then get a
green check mark so it was clear which ones were addressed vs punted but it
also takes nearly forever to complete.

On Sun, Nov 7, 2021 at 11:43 PM David Smiley <ds...@apache.org> wrote:

> Sonatype-lift giving a passing review on a PR is not a qualification we
> have on our project to merge.  Therefore it's also pointless to tell it to
> "ignore" anything.  The bot provides feedback at specific line numbers that
> we should at least briefly consider; ideally say something about in the
> comment to show (to each other) that we looked at it.  That's it.  Perhaps
> some projects choose to gate merging on Lift's complete validation.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Nov 7, 2021 at 8:31 PM Gus Heck <gu...@gmail.com> wrote:
>
>> https://github.com/apache/solr/pull/397 is the new PR for those
>> interested. Will try to merge it next weekend if no objections are found.
>>
>> Am a little unclear on what sonatype-lift is doing. I told it to ignore
>> one (that I filed a separate ticket for) then made a fix thinking that that
>> would cull out a couple but not all of the remaining items and then it
>> seemingly passed review so maybe ignore blocked an entire category of
>> warnings, not the specific item?
>>
>> On Wed, Sep 1, 2021 at 3:12 PM Houston Putman <ho...@gmail.com>
>> wrote:
>>
>>> We do not currently have the docker image being built and tested on CI.
>>> I can get working on that.
>>>
>>> On Wed, Sep 1, 2021 at 3:07 PM David Smiley <ds...@apache.org> wrote:
>>>
>>>> JettySolrRunner can only do so much.  There are other differences
>>>> between a real running server and JettySolrRunner such as our bin/solr
>>>> script and probably much and all of our Jetty configuration.  I've been
>>>> bitten by a classpath bug that is really only possible to see when you run
>>>> things for real.  To that end, I think our long term plan on integration
>>>> testing should be Docker, and don't worry *too much* about how
>>>> accurate JettySolrRunner runs Solr.  We're a major step ahead nowadays with
>>>> Docker being part of the project.  I think the next step is building and
>>>> running its tests on a CI -- I don't recall if this happens already?.  A
>>>> subsequent step https://issues.apache.org/jira/browse/SOLR-11872 (see
>>>> last comment summary) to allow more of our tests to work with a managed
>>>> SolrClient that is pluggable and can use a Solr instance running in Docker
>>>> in particular.  This is how I test our Solr plugins where I work -- using
>>>> different modes of Solr for different levels of testing and thoroughness.
>>>> I'll touch on this at an ApacheCon talk later this month.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Tue, Aug 31, 2021 at 6:44 PM Chris Hostetter <
>>>> hossman_lucene@fucit.org> wrote:
>>>>
>>>>>
>>>>> Gus: I skimmed very little of this mail/thread -- just enough to
>>>>> recognize
>>>>> that it's a rabbit hole I'm not ready to devote brain cells to, but
>>>>> aplaud
>>>>> your interest in doing so.
>>>>>
>>>>> I will comment on just one aspect of your email, only to point you at
>>>>> some
>>>>> "prior hole diving" i did, on sub-topic you mentioned, that you may or
>>>>> may
>>>>> not find useful...
>>>>>
>>>>> : That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>>> very
>>>>> : confusing, and worrisome since I've realized it's not respecting our
>>>>> : web.xml at all, and any configuration in web.xml needs to be
>>>>> duplicated for
>>>>> : our tests in JettySolrRunner (tangent alert)
>>>>>
>>>>>         https://issues.apache.org/jira/browse/SOLR-14903
>>>>>
>>>>> ...I'm not saying it's a good direction to go in, just that it's a
>>>>> past
>>>>> experience/mistake you mind find interesting.
>>>>>
>>>>>
>>>>> -Hoss
>>>>> http://www.lucidworks.com/
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>
>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by David Smiley <ds...@apache.org>.
Sonatype-lift giving a passing review on a PR is not a qualification we
have on our project to merge.  Therefore it's also pointless to tell it to
"ignore" anything.  The bot provides feedback at specific line numbers that
we should at least briefly consider; ideally say something about in the
comment to show (to each other) that we looked at it.  That's it.  Perhaps
some projects choose to gate merging on Lift's complete validation.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Nov 7, 2021 at 8:31 PM Gus Heck <gu...@gmail.com> wrote:

> https://github.com/apache/solr/pull/397 is the new PR for those
> interested. Will try to merge it next weekend if no objections are found.
>
> Am a little unclear on what sonatype-lift is doing. I told it to ignore
> one (that I filed a separate ticket for) then made a fix thinking that that
> would cull out a couple but not all of the remaining items and then it
> seemingly passed review so maybe ignore blocked an entire category of
> warnings, not the specific item?
>
> On Wed, Sep 1, 2021 at 3:12 PM Houston Putman <ho...@gmail.com>
> wrote:
>
>> We do not currently have the docker image being built and tested on CI. I
>> can get working on that.
>>
>> On Wed, Sep 1, 2021 at 3:07 PM David Smiley <ds...@apache.org> wrote:
>>
>>> JettySolrRunner can only do so much.  There are other differences
>>> between a real running server and JettySolrRunner such as our bin/solr
>>> script and probably much and all of our Jetty configuration.  I've been
>>> bitten by a classpath bug that is really only possible to see when you run
>>> things for real.  To that end, I think our long term plan on integration
>>> testing should be Docker, and don't worry *too much* about how accurate
>>> JettySolrRunner runs Solr.  We're a major step ahead nowadays with Docker
>>> being part of the project.  I think the next step is building and running
>>> its tests on a CI -- I don't recall if this happens already?.  A subsequent
>>> step https://issues.apache.org/jira/browse/SOLR-11872 (see last comment
>>> summary) to allow more of our tests to work with a managed SolrClient that
>>> is pluggable and can use a Solr instance running in Docker in particular.
>>> This is how I test our Solr plugins where I work -- using different modes
>>> of Solr for different levels of testing and thoroughness.  I'll touch on
>>> this at an ApacheCon talk later this month.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Tue, Aug 31, 2021 at 6:44 PM Chris Hostetter <
>>> hossman_lucene@fucit.org> wrote:
>>>
>>>>
>>>> Gus: I skimmed very little of this mail/thread -- just enough to
>>>> recognize
>>>> that it's a rabbit hole I'm not ready to devote brain cells to, but
>>>> aplaud
>>>> your interest in doing so.
>>>>
>>>> I will comment on just one aspect of your email, only to point you at
>>>> some
>>>> "prior hole diving" i did, on sub-topic you mentioned, that you may or
>>>> may
>>>> not find useful...
>>>>
>>>> : That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>> very
>>>> : confusing, and worrisome since I've realized it's not respecting our
>>>> : web.xml at all, and any configuration in web.xml needs to be
>>>> duplicated for
>>>> : our tests in JettySolrRunner (tangent alert)
>>>>
>>>>         https://issues.apache.org/jira/browse/SOLR-14903
>>>>
>>>> ...I'm not saying it's a good direction to go in, just that it's a past
>>>> experience/mistake you mind find interesting.
>>>>
>>>>
>>>> -Hoss
>>>> http://www.lucidworks.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>
>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Solr and Jetty and Servlets

Posted by Gus Heck <gu...@gmail.com>.
https://github.com/apache/solr/pull/397 is the new PR for those interested.
Will try to merge it next weekend if no objections are found.

Am a little unclear on what sonatype-lift is doing. I told it to ignore one
(that I filed a separate ticket for) then made a fix thinking that that
would cull out a couple but not all of the remaining items and then it
seemingly passed review so maybe ignore blocked an entire category of
warnings, not the specific item?

On Wed, Sep 1, 2021 at 3:12 PM Houston Putman <ho...@gmail.com>
wrote:

> We do not currently have the docker image being built and tested on CI. I
> can get working on that.
>
> On Wed, Sep 1, 2021 at 3:07 PM David Smiley <ds...@apache.org> wrote:
>
>> JettySolrRunner can only do so much.  There are other differences between
>> a real running server and JettySolrRunner such as our bin/solr script and
>> probably much and all of our Jetty configuration.  I've been bitten by a
>> classpath bug that is really only possible to see when you run things for
>> real.  To that end, I think our long term plan on integration testing
>> should be Docker, and don't worry *too much* about how accurate
>> JettySolrRunner runs Solr.  We're a major step ahead nowadays with Docker
>> being part of the project.  I think the next step is building and running
>> its tests on a CI -- I don't recall if this happens already?.  A subsequent
>> step https://issues.apache.org/jira/browse/SOLR-11872 (see last comment
>> summary) to allow more of our tests to work with a managed SolrClient that
>> is pluggable and can use a Solr instance running in Docker in particular.
>> This is how I test our Solr plugins where I work -- using different modes
>> of Solr for different levels of testing and thoroughness.  I'll touch on
>> this at an ApacheCon talk later this month.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Tue, Aug 31, 2021 at 6:44 PM Chris Hostetter <ho...@fucit.org>
>> wrote:
>>
>>>
>>> Gus: I skimmed very little of this mail/thread -- just enough to
>>> recognize
>>> that it's a rabbit hole I'm not ready to devote brain cells to, but
>>> aplaud
>>> your interest in doing so.
>>>
>>> I will comment on just one aspect of your email, only to point you at
>>> some
>>> "prior hole diving" i did, on sub-topic you mentioned, that you may or
>>> may
>>> not find useful...
>>>
>>> : That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>> very
>>> : confusing, and worrisome since I've realized it's not respecting our
>>> : web.xml at all, and any configuration in web.xml needs to be
>>> duplicated for
>>> : our tests in JettySolrRunner (tangent alert)
>>>
>>>         https://issues.apache.org/jira/browse/SOLR-14903
>>>
>>> ...I'm not saying it's a good direction to go in, just that it's a past
>>> experience/mistake you mind find interesting.
>>>
>>>
>>> -Hoss
>>> http://www.lucidworks.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>
>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Posted by Houston Putman <ho...@gmail.com>.
We do not currently have the docker image being built and tested on CI. I
can get working on that.

On Wed, Sep 1, 2021 at 3:07 PM David Smiley <ds...@apache.org> wrote:

> JettySolrRunner can only do so much.  There are other differences between
> a real running server and JettySolrRunner such as our bin/solr script and
> probably much and all of our Jetty configuration.  I've been bitten by a
> classpath bug that is really only possible to see when you run things for
> real.  To that end, I think our long term plan on integration testing
> should be Docker, and don't worry *too much* about how accurate
> JettySolrRunner runs Solr.  We're a major step ahead nowadays with Docker
> being part of the project.  I think the next step is building and running
> its tests on a CI -- I don't recall if this happens already?.  A subsequent
> step https://issues.apache.org/jira/browse/SOLR-11872 (see last comment
> summary) to allow more of our tests to work with a managed SolrClient that
> is pluggable and can use a Solr instance running in Docker in particular.
> This is how I test our Solr plugins where I work -- using different modes
> of Solr for different levels of testing and thoroughness.  I'll touch on
> this at an ApacheCon talk later this month.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Aug 31, 2021 at 6:44 PM Chris Hostetter <ho...@fucit.org>
> wrote:
>
>>
>> Gus: I skimmed very little of this mail/thread -- just enough to
>> recognize
>> that it's a rabbit hole I'm not ready to devote brain cells to, but
>> aplaud
>> your interest in doing so.
>>
>> I will comment on just one aspect of your email, only to point you at
>> some
>> "prior hole diving" i did, on sub-topic you mentioned, that you may or
>> may
>> not find useful...
>>
>> : That wasn't too terribly hard, but keeping JettySolrRunner happy was
>> very
>> : confusing, and worrisome since I've realized it's not respecting our
>> : web.xml at all, and any configuration in web.xml needs to be duplicated
>> for
>> : our tests in JettySolrRunner (tangent alert)
>>
>>         https://issues.apache.org/jira/browse/SOLR-14903
>>
>> ...I'm not saying it's a good direction to go in, just that it's a past
>> experience/mistake you mind find interesting.
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>

Re: Solr and Jetty and Servlets

Posted by David Smiley <ds...@apache.org>.
JettySolrRunner can only do so much.  There are other differences between a
real running server and JettySolrRunner such as our bin/solr script and
probably much and all of our Jetty configuration.  I've been bitten by a
classpath bug that is really only possible to see when you run things for
real.  To that end, I think our long term plan on integration testing
should be Docker, and don't worry *too much* about how accurate
JettySolrRunner runs Solr.  We're a major step ahead nowadays with Docker
being part of the project.  I think the next step is building and running
its tests on a CI -- I don't recall if this happens already?.  A subsequent
step https://issues.apache.org/jira/browse/SOLR-11872 (see last comment
summary) to allow more of our tests to work with a managed SolrClient that
is pluggable and can use a Solr instance running in Docker in particular.
This is how I test our Solr plugins where I work -- using different modes
of Solr for different levels of testing and thoroughness.  I'll touch on
this at an ApacheCon talk later this month.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Aug 31, 2021 at 6:44 PM Chris Hostetter <ho...@fucit.org>
wrote:

>
> Gus: I skimmed very little of this mail/thread -- just enough to recognize
> that it's a rabbit hole I'm not ready to devote brain cells to, but aplaud
> your interest in doing so.
>
> I will comment on just one aspect of your email, only to point you at some
> "prior hole diving" i did, on sub-topic you mentioned, that you may or may
> not find useful...
>
> : That wasn't too terribly hard, but keeping JettySolrRunner happy was very
> : confusing, and worrisome since I've realized it's not respecting our
> : web.xml at all, and any configuration in web.xml needs to be duplicated
> for
> : our tests in JettySolrRunner (tangent alert)
>
>         https://issues.apache.org/jira/browse/SOLR-14903
>
> ...I'm not saying it's a good direction to go in, just that it's a past
> experience/mistake you mind find interesting.
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: Solr and Jetty and Servlets

Posted by Chris Hostetter <ho...@fucit.org>.
Gus: I skimmed very little of this mail/thread -- just enough to recognize 
that it's a rabbit hole I'm not ready to devote brain cells to, but aplaud 
your interest in doing so.

I will comment on just one aspect of your email, only to point you at some 
"prior hole diving" i did, on sub-topic you mentioned, that you may or may 
not find useful...

: That wasn't too terribly hard, but keeping JettySolrRunner happy was very
: confusing, and worrisome since I've realized it's not respecting our
: web.xml at all, and any configuration in web.xml needs to be duplicated for
: our tests in JettySolrRunner (tangent alert)

	https://issues.apache.org/jira/browse/SOLR-14903

...I'm not saying it's a good direction to go in, just that it's a past 
experience/mistake you mind find interesting.


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org