You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Mark Miller <ma...@gmail.com> on 2021/07/08 05:24:49 UTC
Requests for ideas on getting into Solr dev.
A while back a couple outside devs reached out to me asking advice on how
to get started doing Solr development.
Basically, I generally just say to find some item to improve that has some
synergy with your Solr use needs or pet peeves. I got the feeling that was
not the best jump-starter for ideas though, and you usually have to come up
with a couple on the spot, which is when I have fewest ideas.
For those people and any others in a sim situation, here is a small list of
ideas I have:
- Add support for Jetty Quickstart
https://webtide.com/jetty-9-quick-start/
- Upgrade Jetty to 10 or 11 - default offheap ByteBuffers for the
client/server (this config went away in 9 and returns in 10/11) and
improved HttpClient classes (old ones around but deprecated). Improved
Quickstart support, the current dev momentum center.
- Add the most basic Servlet Request async support, simply to run the
request thread in our own threadpool, no async processing actually done.
Then Solr’s pool can be the unlimited pool and Jetty can have a limited
pool as it expects and works best with.
- Add SolrQosFilter that extends Jetty’s QoSFilter. Allow allow internal
requests to prevent deadlock, but use async Servlet Request feature via
QoSFilter to suspend and prioritize un-susspend of external requests on
overload.
- Dig into removing @ThreadLeakLingering(linger = 10000) from
SolrTestCase.
- Add a custom Jetty LifeCycleListener that get’s notified of Shutdown
before Jetty shutdown has already begun (the situation when we are
currently notified by ServletFilter#destroy). Remove the live node entry
there.
- Look into current state of the static field checker for tests. Since
the same JVM is used across many to potentially all of the tests, they
should avoid leaving behind large static remnants in the test class. Is
this check working? Are there offenders?
- Look into the @BeforeClass @AfterClass shadowing method name test
rule. This should prevent test class hierarchies from having the same name
for these static methods because there can be hard to pin down bad issues
when this happens. Look into it’s effectiveness and current violations.
See NoClassHooksShadowingRule
- Review for inefficient Collection / StringBuilder size init. Make
sizing improvements where a lot of capacity grows are guaranteed and or
likely or we already know the size.
- Run the tests using IntelliJ’s allocate profiler, java flight
recorder, YourKit with allocation recording on, etc and note the top
allocated objects. Look at simple changes (say reuse or alternative
implementations) for some of the largest outliers.
- Change some ZooKeeper usages to use the much more efficient Async API.
It’s performance is essentially that of a MultiOp, but without
the requirement everything succeed/fail atomically. Bonus if you can change
a path to run fully async, where step b only runs or consumes resources
when zk async call A is finished. Most systems that provide async also
provide reasonably good back pressure for free. Lots of wins in this area.
- Look at making ZkClient calls that come in and hit ConnectionLoss
simply wait until ZooKeeper has connected again rather than any retries or
fails and repeated attempts. Verify that on ConnectionLoss, the system
essentially goes quite to ZK until reconnection instead of ramping up
trying to make something with ZK happen.
- Investigate using the Lucene Segment Replicator replication strategy
in PULL or TLOG replicas to take advantage of it’s NRT segment replication
feature and awesome, isolated testing and Lucene integration.
- Investigate combining or dropping calls to ZooKeeper. There are only
so many types of calls that go to ZK. They are only usefully bringing back
information at a pretty slow rate in computer time. Review calls that are
hitting many times a second or just generally too much for the information
trade happening. Updating some timely item once a second is perhaps
reasonable. Trying to do something so fast the last call has not even
completed, perhaps not. The system should not rely on cluster state being
100% up to date the huge majority of cases, which means rapid updates are
likely never very sensible.
- Make it dead simple to setup Solr to log to a JSON output format.
- Look at introducing fair locking into the TransactionLog. When locks
on a path are constantly gotten and released like happens there, especially
with updates that can depend on each other, you can end up with old
requests lock attempts often getting beat by new requests along the
lock/unlock chain, and this can cause traffic mayhem.
--
- Mark
http://about.me/markrmiller