You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Noble Paul (Jira)" <ji...@apache.org> on 2019/10/24 21:09:00 UTC

[jira] [Commented] (SOLR-13867) Make Solrcloud stable and performant and capable of having passing tests.

    [ https://issues.apache.org/jira/browse/SOLR-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959219#comment-16959219 ] 

Noble Paul commented on SOLR-13867:
-----------------------------------

 I agree with your observations that Solr has become a kitchen sink of a million things. There is feature creep and there are a lot of badly written code. My suggestion to fix Solr is to make it modular.

 
 * Make a _Core Solr_ with only the essential features.  Index,Search, cloud related stuff, core set of APIs and easy way to load /unload modules. All tests in _Core Solr_ must ALWAYS pass.
 * Everything else (DIH,Autoscaling, Streaming, All URPs, analyzers/tokenizers, HDFS, Security plugins , CDCR etc etc) moves out to optional modules which can be installed only if and when necessary. The users should be able to pick & choose what they want.
 * Carefully make the APIs in the _Core Solr_ better so that the modules can purely depend on them. APIs should maintain backcompat (if possible across major versions) and should be properly documented
 * Any additions to the _Core Solr_ should go with proper vetting. Possibly after a vote in the community. Modules can have a more flexible policy. 

> Make Solrcloud stable and performant and capable of having passing tests.
> -------------------------------------------------------------------------
>
>                 Key: SOLR-13867
>                 URL: https://issues.apache.org/jira/browse/SOLR-13867
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Major
>             Fix For: master (9.0)
>
>
> After spending a bit of time away from SolrCloud after being deeply involved in trying to stabilize it and it's tests, I came back in 2018 and went deep into the system with the Starburst upgrade.
> What I found surprised me, though I guess it should not have. The system is slow, often silly, super buggy, not good at connection reuse or thread safety or efficient Zookeeper communication or efficient startup and shutdown.
> Often, the things we do to make tests pass make things worse because you can't do things reasonably without some major code work and so we fight for tests passes, not correctness.
> Twice now, I've seen the system in the shape it was supposed to take. FAST. Not bug free, but 100X more solid at least and much, much, much, much faster.
> The current system is sick and actually getting worse under it's weight as more is shoveled on top. Even since 1.5 years ago, the problems are worse, not better. Tests will never pass. Yes, our tests where in pretty bad shape. But you can put them in the best shape possible and it won't matter. The system will still fail tests.
> Sadly, I'm smart enough to know what has to be done, but not smart enough to keep my work around after addressing most of the problems twice.
> Non the less, it's time to fix SolrCloud. It's not supposed to be this way. I've twice spent a week or two in a state with super fast SolrCloud. Super fast build system. Developmenet is actually fun. You actually have a chance. I'm talking tests you have never seen take under 45-60 seconds taking 5.  Consistently. A different world.
> I spent a lot of time after starburst making tests pass for me. Then a lot of time on a better build system that can help us improve development and good practices around the project. And then a lot of time making tests faster. These are important steps, but little itty bitty baby steps without addressing the core rot that is growing. We don't find a problem and fully understand what is up and craft a careful solution. We find something that we can toss into the grand canyon, listen to it bounce around for a while, and if nobody screams, we move on to the next thing. That's not necessarily anyone's choice, there is little else you can do until the system is fixed. When that happens we can start making smart changes instead of just shoving around the mess.
> Twice I have made the current system fast. What happens first? Nothing works. The system doesn't know how to be fast. It doesn't have the thread safety or proper logic to be fast. And that is not a place I want to be.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org