You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@solr.apache.org by Mark Miller <ma...@gmail.com> on 2021/04/19 00:02:56 UTC

A note to those that have tried to get in on this Solr2 / Next Big Thing Effort

Things are for all intents and purposes, at the point where my train
stops.  The next sensible train is likely, really the one you wanted to
board. And that is looking at what short term value and experiments and
results can likely be harnessed and converted from the issues I create in a
fork into Solr proper issues.

That is going to to be a tough part for me. And won’t include what some
would like.  There is a longer term thing Ive found around resource usage
and test performance and stability and the myriad of things our tests won’t
catch. That’s high value that is all entangled and comprehensive, and I
won’t be killing myself on a tougher path to that in Solr proper after
killing myself on a path to that.

But there remains a lot of stuff to look at and take value from. That’s not
something I’m going to be great at driving - I’ll drive stuff that comes
from carear driven needs. So as I drop the notes and make it easier to test
drive and experiment with the branch, this is where you can add value.
Helping to drive what brings value based on your own experiences and
external pressure, and helping to extract that value.

Where I can add value is in helping on that effort, but independently,
anything i do will be mostly driven by needs outside my own in the short
term.

Mark

-- 
- Mark

http://about.me/markrmiller

Re: A note to those that have tried to get in on this Solr2 / Next Big Thing Effort

Posted by Mark Miller <ma...@gmail.com>.

Thanks. Features lists are as hard as time frames. I keep a boat load of
various experiments and dead ends and potential on the side that would just
muddy the waters until the basics are right. And the basics take 95% of the
time and effort. So it’s 11:59 pm before I know what gets pulled out, say
nothing about what gets to go in.

It’s really just punches to the face until 30 seconds left and then it’s
all the sugar in one shot. ForkJoin, full on async, all the last mile on
async, JCTools and altérnate implementation experimentation, everything
that’s half way impossible to get right until hammer on the system like you
only care about performance and scale because thats the only way to expose
the ridiculous number of issues that lurk in the slow and resource heavy
envs. A decade or two of whack a mole (done it) or a death drive to
efficiency. You try to play around with async IO and suspending requests
when everything else is not right and you just add new problems. So any
list beyond fast and good was beyond what I could commit to.

Anyway, I couldn’t fully introduce the best stuff without walking through
fire - and given I’m about burned up when I get to it and when you add it
all your tests and stability goes out the window because every leap forward
exposed a new world of test problems and tying together at the moment that
they are preparing to deport you and your about dead. The whole ordeal is
such a sh*t show. But crawled out somehow. Now if I touch the thing, it’s
like washing my nice car instead of a large rock on my head that seems to
be getting heavier.

Only way I could a reference for myself to cheat in the future. I’m sure it
will allow some cheating by others.

On Sat, Apr 24, 2021 at 4:46 PM Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> This is the most comprehensive list of improvements we've seen so far. I
> know from our conversations having discussed many of these goals, but I'm
> glad to see them here as a list.
> Thanks for your work, Mark!
>
> On Sat, Apr 24, 2021 at 11:05 PM Mark Miller <ma...@gmail.com>
> wrote:
>
>> *Ive taken a step back on this branch for a bit as I engage in other
>> things and catch a breath.*
>> 
>> Before I dump some of my  notes on the background of some of the
>> technical stuff, I’ll likely make it a bit easier to take the branch for a
>> spin and put up the remaining code I have.
>> 
>> The branch is essentially as named. A reference branch for Solr scale,
>> performance, and stability. An investigation into what I missed on
>> SolrCloud and a preparation to not miss next time.
>> 
>> The high level goals and deliverables mostly boil down to:
>> 
>>
>>    - Heavily reduced GC and memory usage and leak sweeps.
>>
>>
>>    - Heavily reduced reliance on huge amounts of unnecessary threads
>>    and context switching and problematic thread management.
>>
>>
>>    - Large gains in performance and efficiency across the board.
>>
>>
>>    - Large advances in Zookeeper usage and behavior and efficiency.
>>
>>
>>    - Fast and efficient multi collection support, scaling to 1000’s of
>>    collections and 10s of thousands of cores with relative ease compared to
>>    the past.
>>
>>
>>    - Hardened and improved recovery and leadership election paths.
>>
>>
>>    - Fast and stable tests, both standard and nightly.
>>
>>
>>    - Large improvements in indexing performance and efficiency,
>>    especially when indexing to multiple replicas.
>>
>>
>>    - Connection use and stability and efficiency improvements.
>>
>>
>>    - Async update and query paths.
>>
>>
>>    - Improved and hardened HTTP2 support through the system.
>>
>>
>>    - Optional async servlet requests, with optional use of async IO.
>>
>>
>>    - Improved and hardened startup / shutdown and cluster restarts.
>>
>>
>>    - Efficiencies and improvements around dealing with overload and
>>    request priority.
>>
>>
>>    - Improvements and changes and starting paths to allow for further
>>    and larger scale while retaining resource control and performance.
>>
>> 
>> And a variety of other things, though it won’t all end up 100%
>> finished.
>> 
>> It will essentially power the next phase of my dev career in Java. But
>> there may be some fallout for others as well.
>> 
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
- Mark

http://about.me/markrmiller

Re: A note to those that have tried to get in on this Solr2 / Next Big Thing Effort

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.

This is the most comprehensive list of improvements we've seen so far. I
know from our conversations having discussed many of these goals, but I'm
glad to see them here as a list.
Thanks for your work, Mark!

On Sat, Apr 24, 2021 at 11:05 PM Mark Miller <ma...@gmail.com> wrote:

> *Ive taken a step back on this branch for a bit as I engage in other
> things and catch a breath.*
> 
> Before I dump some of my  notes on the background of some of the
> technical stuff, I’ll likely make it a bit easier to take the branch for a
> spin and put up the remaining code I have.
> 
> The branch is essentially as named. A reference branch for Solr scale,
> performance, and stability. An investigation into what I missed on
> SolrCloud and a preparation to not miss next time.
> 
> The high level goals and deliverables mostly boil down to:
> 
>
>    - Heavily reduced GC and memory usage and leak sweeps.
>
>
>    - Heavily reduced reliance on huge amounts of unnecessary threads
>    and context switching and problematic thread management.
>
>
>    - Large gains in performance and efficiency across the board.
>
>
>    - Large advances in Zookeeper usage and behavior and efficiency.
>
>
>    - Fast and efficient multi collection support, scaling to 1000’s of
>    collections and 10s of thousands of cores with relative ease compared to
>    the past.
>
>
>    - Hardened and improved recovery and leadership election paths.
>
>
>    - Fast and stable tests, both standard and nightly.
>
>
>    - Large improvements in indexing performance and efficiency,
>    especially when indexing to multiple replicas.
>
>
>    - Connection use and stability and efficiency improvements.
>
>
>    - Async update and query paths.
>
>
>    - Improved and hardened HTTP2 support through the system.
>
>
>    - Optional async servlet requests, with optional use of async IO.
>
>
>    - Improved and hardened startup / shutdown and cluster restarts.
>
>
>    - Efficiencies and improvements around dealing with overload and
>    request priority.
>
>
>    - Improvements and changes and starting paths to allow for further
>    and larger scale while retaining resource control and performance.
>
> 
> And a variety of other things, though it won’t all end up 100%
> finished.
> 
> It will essentially power the next phase of my dev career in Java. But
> there may be some fallout for others as well.
> 
> --
> - Mark
>
> http://about.me/markrmiller
>

Re: A note to those that have tried to get in on this Solr2 / Next Big Thing Effort

Posted by Mark Miller <ma...@gmail.com>.

*Ive taken a step back on this branch for a bit as I engage in other things
and catch a breath.*

Before I dump some of my  notes on the background of some of the
technical stuff, I’ll likely make it a bit easier to take the branch for a
spin and put up the remaining code I have.

The branch is essentially as named. A reference branch for Solr scale,
performance, and stability. An investigation into what I missed on
SolrCloud and a preparation to not miss next time.

The high level goals and deliverables mostly boil down to:


   - Heavily reduced GC and memory usage and leak sweeps.


   - Heavily reduced reliance on huge amounts of unnecessary threads and
   context switching and problematic thread management.


   - Large gains in performance and efficiency across the board.


   - Large advances in Zookeeper usage and behavior and efficiency.


   - Fast and efficient multi collection support, scaling to 1000’s of
   collections and 10s of thousands of cores with relative ease compared to
   the past.


   - Hardened and improved recovery and leadership election paths.


   - Fast and stable tests, both standard and nightly.


   - Large improvements in indexing performance and efficiency,
   especially when indexing to multiple replicas.


   - Connection use and stability and efficiency improvements.


   - Async update and query paths.


   - Improved and hardened HTTP2 support through the system.


   - Optional async servlet requests, with optional use of async IO.


   - Improved and hardened startup / shutdown and cluster restarts.


   - Efficiencies and improvements around dealing with overload and
   request priority.


   - Improvements and changes and starting paths to allow for further and
   larger scale while retaining resource control and performance.


And a variety of other things, though it won’t all end up 100% finished.

It will essentially power the next phase of my dev career in Java. But
there may be some fallout for others as well.

-- 
- Mark

http://about.me/markrmiller