You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Mark Miller <ma...@gmail.com> on 2021/07/08 05:24:49 UTC
Requests for ideas on getting into Solr dev.

A while back a couple outside devs reached out to me asking advice on how
to get started doing Solr development.

Basically, I generally just say to find some item to improve that has some
synergy with your Solr use needs or pet peeves. I got the feeling that was
not the best jump-starter for ideas though, and you usually have to come up
with a couple on the spot, which is when I have fewest ideas.

For those people and any others in a sim situation, here is a small list of
ideas I have:


   - Add support for Jetty Quickstart
   https://webtide.com/jetty-9-quick-start/
   - Upgrade Jetty to 10 or 11 - default offheap ByteBuffers for the
   client/server (this config went away in 9 and returns in 10/11) and
   improved HttpClient classes (old ones around but deprecated). Improved
   Quickstart support, the current dev momentum center.
   - Add the most basic Servlet Request async support, simply to run the
   request thread in our own threadpool, no async processing actually done.
   Then Solr’s pool can be the unlimited pool and Jetty can have a limited
   pool as it expects and works best with.
   - Add SolrQosFilter that extends Jetty’s QoSFilter. Allow allow internal
   requests to prevent deadlock, but use async Servlet Request feature via
   QoSFilter to suspend and prioritize un-susspend of external requests on
   overload.
   - Dig into removing @ThreadLeakLingering(linger = 10000) from
   SolrTestCase.
   - Add a custom Jetty LifeCycleListener that get’s notified of Shutdown
   before Jetty shutdown has already begun (the situation when we are
   currently notified by ServletFilter#destroy). Remove the live node entry
   there.
   - Look into current state of the static field checker for tests. Since
   the same JVM is used across many to potentially all of the tests, they
   should avoid leaving behind large static remnants in the test class. Is
   this check working? Are there offenders?
   - Look into the @BeforeClass @AfterClass shadowing method name test
   rule. This should prevent test class hierarchies from having the same name
   for these static methods because there can be hard to pin down bad issues
   when this happens. Look into it’s effectiveness and current violations.
   See NoClassHooksShadowingRule
   - Review for inefficient Collection / StringBuilder size init. Make
   sizing improvements where a lot of capacity grows are guaranteed and or
   likely or we already know the size.
   - Run the tests using IntelliJ’s allocate profiler, java flight
   recorder, YourKit with allocation recording on, etc and note the top
   allocated objects. Look at simple changes (say reuse or alternative
   implementations) for some of the largest outliers.
   - Change some ZooKeeper usages to use the much more efficient Async API.
   It’s performance is essentially that of a MultiOp, but without
   the requirement everything succeed/fail atomically. Bonus if you can change
   a path to run fully async, where step b only runs or consumes resources
   when zk async call A is finished. Most systems that provide async also
   provide reasonably good back pressure for free. Lots of wins in this area.
   - Look at making ZkClient calls that come in and hit ConnectionLoss
   simply wait until ZooKeeper has connected again rather than any retries or
   fails and repeated attempts. Verify that on ConnectionLoss, the system
   essentially goes quite to ZK until reconnection instead of ramping up
   trying to make something with ZK happen.
   - Investigate using the Lucene Segment Replicator replication strategy
   in PULL or TLOG replicas to take advantage of it’s NRT segment replication
   feature and awesome, isolated testing and Lucene integration.
   - Investigate combining or dropping calls to ZooKeeper. There are only
   so many types of calls that go to ZK. They are only usefully bringing back
   information at a pretty slow rate in computer time. Review calls that are
   hitting many times a second or just generally too much for the information
   trade happening. Updating some timely item once a second is perhaps
   reasonable. Trying to do something so fast the last call has not even
   completed, perhaps not. The system should not rely on cluster state being
   100% up to date the huge majority of cases, which means rapid updates are
   likely never very sensible.
   - Make it dead simple to setup Solr to log to a JSON output format.
   - Look at introducing fair locking into the TransactionLog. When locks
   on a path are constantly gotten and released like happens there, especially
   with updates that can depend on each other, you can end up with old
   requests lock attempts often getting beat by new requests along the
   lock/unlock chain, and this can cause traffic mayhem.

-- 
- Mark

http://about.me/markrmiller