You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Jan Høydahl <ja...@cominvent.com> on 2022/01/12 15:31:06 UTC

Modularizing Solr with new contrib packages

Hi,

I just did an attempt to lift out the JWT auth plugin from solr-core into its own contrib [1] and it wasn't too hard. 
I think it gives much better insight into the dependency situation and nice to have a separate solr-jwt-auth-9.0.0.jar
This is also a first step towards converting it to a proper package, this needs to be done first in any case.

I think there are lots of pieces of code in solr-core that can easily be extracted the same way. 
Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.

To aid in the process I hacked a python tool that scaffolds a new contrib module [2].
Go give it a spin and see where YOU can un-bloat Solr-core today :)

Related to this I also suggest [3] to make it easier to add contribs to classpath when starting Solr. I think users would love it :)
That was inspired by solrOptions.solrModules in Solr's helm-chart for Kubernetes [4]

[1] https://github.com/apache/solr/pull/518
[2] https://github.com/apache/solr/pull/519
[3] https://issues.apache.org/jira/browse/SOLR-15914
[4] https://artifacthub.io/packages/helm/apache-solr/solr#running-solr

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org


Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
All awesomeness!

Speaking of modularization:
* https://issues.apache.org/jira/browse/SOLR-15904 Move SQLHandler to a
contrib/module/package
-- just a JIRA issue; I don't have time for this one now.
* https://issues.apache.org/jira/browse/SOLR-14660


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jan 12, 2022 at 10:31 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Hi,
>
> I just did an attempt to lift out the JWT auth plugin from solr-core into
> its own contrib [1] and it wasn't too hard.
> I think it gives much better insight into the dependency situation and
> nice to have a separate solr-jwt-auth-9.0.0.jar
> This is also a first step towards converting it to a proper package, this
> needs to be done first in any case.
>
> I think there are lots of pieces of code in solr-core that can easily be
> extracted the same way.
> Some perhaps even for 9.0.0, as it slims down the core and reduces attack
> surface for most users as well.
>
> To aid in the process I hacked a python tool that scaffolds a new contrib
> module [2].
> Go give it a spin and see where YOU can un-bloat Solr-core today :)
>
> Related to this I also suggest [3] to make it easier to add contribs to
> classpath when starting Solr. I think users would love it :)
> That was inspired by solrOptions.solrModules in Solr's helm-chart for
> Kubernetes [4]
>
> [1] https://github.com/apache/solr/pull/518
> [2] https://github.com/apache/solr/pull/519
> [3] https://issues.apache.org/jira/browse/SOLR-15914
> [4] https://artifacthub.io/packages/helm/apache-solr/solr#running-solr
>
> Jan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
(my previous email was accidentally sent; it was incomplete!)

Speaking of modularization:
* https://issues.apache.org/jira/browse/SOLR-15904 "Move SQLHandler to a
contrib/module/package" -- just a JIRA issue; I don't have time for this
one now.
* https://issues.apache.org/jira/browse/SOLR-14660 "Migrating HDFS into a
package" -- the contributor messaged me a couple days ago and is committed
to this one; no ETA.  Also maybe it should be all of Hadoop related stuff
(expands scope to some fancy authentication like Kerberos, which
confusingly also uses Hadoop libs).
* https://issues.apache.org/jira/browse/SOLR-15342 "Separate out a
SolrJ-Zookeeper module" -- I'm working with a colleague on this one. I
anticipate something by the end of the week.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jan 12, 2022 at 10:31 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Hi,
>
> I just did an attempt to lift out the JWT auth plugin from solr-core into
> its own contrib [1] and it wasn't too hard.
> I think it gives much better insight into the dependency situation and
> nice to have a separate solr-jwt-auth-9.0.0.jar
> This is also a first step towards converting it to a proper package, this
> needs to be done first in any case.
>
> I think there are lots of pieces of code in solr-core that can easily be
> extracted the same way.
> Some perhaps even for 9.0.0, as it slims down the core and reduces attack
> surface for most users as well.
>
> To aid in the process I hacked a python tool that scaffolds a new contrib
> module [2].
> Go give it a spin and see where YOU can un-bloat Solr-core today :)
>
> Related to this I also suggest [3] to make it easier to add contribs to
> classpath when starting Solr. I think users would love it :)
> That was inspired by solrOptions.solrModules in Solr's helm-chart for
> Kubernetes [4]
>
> [1] https://github.com/apache/solr/pull/518
> [2] https://github.com/apache/solr/pull/519
> [3] https://issues.apache.org/jira/browse/SOLR-15914
> [4] https://artifacthub.io/packages/helm/apache-solr/solr#running-solr
>
> Jan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: Modularizing Solr with new contrib packages

Posted by Alessandro Benedetti <a....@sease.io>.
Thanks, Jan for all the explanations and sorry for the off-topic intrusion
on the other thread.
All makes sense,

Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr PMC member and Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Fri, 21 Jan 2022 at 14:28, David Smiley <ds...@apache.org> wrote:

> Yeah +1 to increase modularization in general.  If for some reason this
> makes the functionality harder to use (which I sympathize with), I think we
> should instead direct our energy to making modules/packages easier to use.
> I'm thrilled about Jan's proposal to simply list the module names on
> bin/solr at startup.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jan 21, 2022 at 8:55 AM Jan Høydahl <ja...@cominvent.com> wrote:
>
>> There was a message from Alessandro in another thread
>> <https://lists.apache.org/thread/q014rj1o4cnhq6olr3krnm66q44868x6>,
>> which I'll answer here instead:
>> (I'm posting the full question below my answers here for context)
>>
>> Many questions, and definitely a tangent :) But let me try a short answer
>>
>> What makes a module(contrib) a module(contrib)?
>>
>>
>> Historically I think it actually was some new contribution that we wanted
>> to be optional, perhaps since it was experimental, perhaps due to extra
>> dependencies, don't know.
>> I think the funny name "contrib" has fended us off from putting new
>> tunctionality in modules, and instead committers have continued to add to
>> solr-core all kinds of things.
>> What I'm trying to push for with the dev-doc linked in the original mail
>> is to shift our mindsets.
>> You should have a really good reason for NOT putting some new
>> functionality that is not going to be used by a majority of users, into a
>> module.
>>
>> *From now on I'll use 'module' where I intend a package under contrib.*
>>
>>
>> This is still in flux until
>> https://issues.apache.org/jira/browse/SOLR-15917 lands on main. After
>> that "contrib" will be history both in code and docs.
>>
>> I am referring to first-party modules such as ltr or langid.
>> My initial understanding was that a module in contrib, is an integration
>> with some external dependency (like langid with OpenNLP, Tika or
>> langdetect).
>> But then, why is *ltr* a module? It doesn't really integrate with any
>> external dependency.
>>
>>
>> Again, it's all historic reasons here. Bloomberg wisely contributed ltr
>> and analysis as contrib modules. Kudos!
>>
>> Then, should this be fixed and brought inside the Solr core?
>>
>>
>> Rather the opposite. We should lift much more non-core features out of
>> core and into modules. That's why I wrote a scaffoldNewModule.py script
>> to lower the bar for those wanting to do that. I'm lifting out
>> JWTAuthPlugin in https://issues.apache.org/jira/browse/SOLR-15907 since
>> it is not needed by all users
>>
>> And what about first party/third party modules?
>> I don't think there's any visible difference right now, but in case we
>> want to make a difference, should we create a sort of official "Solr Plugin
>> Marketplace" ?
>>
>>
>> A 1st party package will be (still not ready) a module that has a
>> manifest added to it and that can be installed locally via pacakge manager.
>>
>> The pkg manager has a concept of package repos. So you can already today
>> add a remote repo, see examples in the dev-doc.
>> See also https://issues.apache.org/jira/browse/SOLR-14688 for a proposed
>> 1st party package design. That JIRA is 1,5 years old :)
>> The thougt is that the Solr project at some point release our modules as
>> separate JAR files to the download repository, and publish
>> a repository.json file at solr.apache.org/repository or similar. Then we
>> can release a slim tgz with only solr-core, and users can pull
>> down the packages they like. Perhaps we'll see some of this materialize
>> in 9.x or at least in 10.0. Until then, all we have is
>> contribs (soon to be named modules) :)
>>
>> Jan
>>
>> 21. jan. 2022 kl. 14:17 skrev Alessandro Benedetti <a.benedetti@sease.io
>> >:
>>
>> I would also add a tangential question (rather than answers at this
>> point):
>> What makes a module(contrib) a module(contrib)?
>> *From now on I'll use 'module' where I intend a package under contrib.*
>>
>> I am referring to first-party modules such as ltr or langid.
>> My initial understanding was that a module in contrib, is an integration
>> with some external dependency (like langid with OpenNLP, Tika or
>> langdetect).
>> But then, why is *ltr* a module? It doesn't really integrate with any
>> external dependency.
>> It's additional query parsers and components for a key Solr functionality.
>> Is it just a legacy consequence of the fact that initially, Bloomberg
>> contributed the module?
>> Maybe this applies to other modules as well (analytics?).
>> Then, should this be fixed and brought inside the Solr core?
>>
>> And what about first party/third party modules?
>> I don't think there's any visible difference right now, but in case we
>> want to make a difference, should we create a sort of official "Solr Plugin
>> Marketplace" ?
>> (I proposed the idea to Lucidworks many years ago when I was working for
>> a partner, and for a certain amount of time, I think there was a Solr
>> Plugin Marketplace, but it was proprietary).
>>
>> I am curious to understand what you think about this and then reason
>> about the naming convention.
>>
>> Cheers
>>
>>
>>
>> ---
>>
>

Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
Yeah +1 to increase modularization in general.  If for some reason this
makes the functionality harder to use (which I sympathize with), I think we
should instead direct our energy to making modules/packages easier to use.
I'm thrilled about Jan's proposal to simply list the module names on
bin/solr at startup.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 21, 2022 at 8:55 AM Jan Høydahl <ja...@cominvent.com> wrote:

> There was a message from Alessandro in another thread
> <https://lists.apache.org/thread/q014rj1o4cnhq6olr3krnm66q44868x6>, which
> I'll answer here instead:
> (I'm posting the full question below my answers here for context)
>
> Many questions, and definitely a tangent :) But let me try a short answer
>
> What makes a module(contrib) a module(contrib)?
>
>
> Historically I think it actually was some new contribution that we wanted
> to be optional, perhaps since it was experimental, perhaps due to extra
> dependencies, don't know.
> I think the funny name "contrib" has fended us off from putting new
> tunctionality in modules, and instead committers have continued to add to
> solr-core all kinds of things.
> What I'm trying to push for with the dev-doc linked in the original mail
> is to shift our mindsets.
> You should have a really good reason for NOT putting some new
> functionality that is not going to be used by a majority of users, into a
> module.
>
> *From now on I'll use 'module' where I intend a package under contrib.*
>
>
> This is still in flux until
> https://issues.apache.org/jira/browse/SOLR-15917 lands on main. After
> that "contrib" will be history both in code and docs.
>
> I am referring to first-party modules such as ltr or langid.
> My initial understanding was that a module in contrib, is an integration
> with some external dependency (like langid with OpenNLP, Tika or
> langdetect).
> But then, why is *ltr* a module? It doesn't really integrate with any
> external dependency.
>
>
> Again, it's all historic reasons here. Bloomberg wisely contributed ltr
> and analysis as contrib modules. Kudos!
>
> Then, should this be fixed and brought inside the Solr core?
>
>
> Rather the opposite. We should lift much more non-core features out of
> core and into modules. That's why I wrote a scaffoldNewModule.py script
> to lower the bar for those wanting to do that. I'm lifting out
> JWTAuthPlugin in https://issues.apache.org/jira/browse/SOLR-15907 since
> it is not needed by all users
>
> And what about first party/third party modules?
> I don't think there's any visible difference right now, but in case we
> want to make a difference, should we create a sort of official "Solr Plugin
> Marketplace" ?
>
>
> A 1st party package will be (still not ready) a module that has a manifest
> added to it and that can be installed locally via pacakge manager.
>
> The pkg manager has a concept of package repos. So you can already today
> add a remote repo, see examples in the dev-doc.
> See also https://issues.apache.org/jira/browse/SOLR-14688 for a proposed
> 1st party package design. That JIRA is 1,5 years old :)
> The thougt is that the Solr project at some point release our modules as
> separate JAR files to the download repository, and publish
> a repository.json file at solr.apache.org/repository or similar. Then we
> can release a slim tgz with only solr-core, and users can pull
> down the packages they like. Perhaps we'll see some of this materialize in
> 9.x or at least in 10.0. Until then, all we have is
> contribs (soon to be named modules) :)
>
> Jan
>
> 21. jan. 2022 kl. 14:17 skrev Alessandro Benedetti <a....@sease.io>:
>
> I would also add a tangential question (rather than answers at this point):
> What makes a module(contrib) a module(contrib)?
> *From now on I'll use 'module' where I intend a package under contrib.*
>
> I am referring to first-party modules such as ltr or langid.
> My initial understanding was that a module in contrib, is an integration
> with some external dependency (like langid with OpenNLP, Tika or
> langdetect).
> But then, why is *ltr* a module? It doesn't really integrate with any
> external dependency.
> It's additional query parsers and components for a key Solr functionality.
> Is it just a legacy consequence of the fact that initially, Bloomberg
> contributed the module?
> Maybe this applies to other modules as well (analytics?).
> Then, should this be fixed and brought inside the Solr core?
>
> And what about first party/third party modules?
> I don't think there's any visible difference right now, but in case we
> want to make a difference, should we create a sort of official "Solr Plugin
> Marketplace" ?
> (I proposed the idea to Lucidworks many years ago when I was working for a
> partner, and for a certain amount of time, I think there was a Solr Plugin
> Marketplace, but it was proprietary).
>
> I am curious to understand what you think about this and then reason about
> the naming convention.
>
> Cheers
>
>
>
> ---
>

Re: Modularizing Solr with new contrib packages

Posted by Jan Høydahl <ja...@cominvent.com>.
There was a message from Alessandro in another thread <https://lists.apache.org/thread/q014rj1o4cnhq6olr3krnm66q44868x6>, which I'll answer here instead:
(I'm posting the full question below my answers here for context)

Many questions, and definitely a tangent :) But let me try a short answer
> What makes a module(contrib) a module(contrib)?

Historically I think it actually was some new contribution that we wanted to be optional, perhaps since it was experimental, perhaps due to extra dependencies, don't know.
I think the funny name "contrib" has fended us off from putting new tunctionality in modules, and instead committers have continued to add to solr-core all kinds of things.
What I'm trying to push for with the dev-doc linked in the original mail is to shift our mindsets.
You should have a really good reason for NOT putting some new functionality that is not going to be used by a majority of users, into a module.

> From now on I'll use 'module' where I intend a package under contrib.

This is still in flux until https://issues.apache.org/jira/browse/SOLR-15917 lands on main. After that "contrib" will be history both in code and docs.

> I am referring to first-party modules such as ltr or langid.
> My initial understanding was that a module in contrib, is an integration with some external dependency (like langid with OpenNLP, Tika or langdetect).
> But then, why is ltr a module? It doesn't really integrate with any external dependency.

Again, it's all historic reasons here. Bloomberg wisely contributed ltr and analysis as contrib modules. Kudos!

> Then, should this be fixed and brought inside the Solr core?

Rather the opposite. We should lift much more non-core features out of core and into modules. That's why I wrote a scaffoldNewModule.py script
to lower the bar for those wanting to do that. I'm lifting out JWTAuthPlugin in https://issues.apache.org/jira/browse/SOLR-15907 since it is not needed by all users

> And what about first party/third party modules?
> I don't think there's any visible difference right now, but in case we want to make a difference, should we create a sort of official "Solr Plugin Marketplace" ?

A 1st party package will be (still not ready) a module that has a manifest added to it and that can be installed locally via pacakge manager.

The pkg manager has a concept of package repos. So you can already today add a remote repo, see examples in the dev-doc.
See also https://issues.apache.org/jira/browse/SOLR-14688 for a proposed 1st party package design. That JIRA is 1,5 years old :)
The thougt is that the Solr project at some point release our modules as separate JAR files to the download repository, and publish
a repository.json file at solr.apache.org/repository or similar. Then we can release a slim tgz with only solr-core, and users can pull
down the packages they like. Perhaps we'll see some of this materialize in 9.x or at least in 10.0. Until then, all we have is
contribs (soon to be named modules) :)

Jan

> 21. jan. 2022 kl. 14:17 skrev Alessandro Benedetti <a....@sease.io>:
> 
> I would also add a tangential question (rather than answers at this point):
> What makes a module(contrib) a module(contrib)?
> From now on I'll use 'module' where I intend a package under contrib.
> 
> I am referring to first-party modules such as ltr or langid.
> My initial understanding was that a module in contrib, is an integration with some external dependency (like langid with OpenNLP, Tika or langdetect).
> But then, why is ltr a module? It doesn't really integrate with any external dependency.
> It's additional query parsers and components for a key Solr functionality.
> Is it just a legacy consequence of the fact that initially, Bloomberg contributed the module?
> Maybe this applies to other modules as well (analytics?).
> Then, should this be fixed and brought inside the Solr core?
> 
> And what about first party/third party modules?
> I don't think there's any visible difference right now, but in case we want to make a difference, should we create a sort of official "Solr Plugin Marketplace" ?
> (I proposed the idea to Lucidworks many years ago when I was working for a partner, and for a certain amount of time, I think there was a Solr Plugin Marketplace, but it was proprietary).
> 
> I am curious to understand what you think about this and then reason about the naming convention.
> 
> Cheers



---

Re: Modularizing Solr with new contrib packages

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
+1 to the developer-doc, very neat. Thanks for adding it.

On Sun, Jan 16, 2022 at 10:52 PM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> I found this a great summary!
>
> On Jan 16, 2022, at 9:48 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>
> I added a developer-doc draft for modules and packages in
> https://github.com/apache/solr/pull/531 (HTML preview
> <https://github.com/apache/solr/blob/7eeaba318a79ed62678ab3ac5f1d403733d88e5f/dev-docs/plugins-modules-packages.adoc>).
> Let me know if it is useful.
>
> Jan
>
> 14. jan. 2022 kl. 18:13 skrev David Smiley <ds...@apache.org>:
>
> Fair points.  I might take a stab at this on the weekend to see.
>
> I propose no change to the SOLR_HOME detection logic, which will naturally
> end up being SOLR_INSTALL/server/solr (where solr.xml is).  Docker stuff
> won't need to set it / play games as it does now.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jan 14, 2022 at 9:08 AM Jan Høydahl <ja...@cominvent.com> wrote:
>
>> Hmm, yea it's always been a bit odd how SOLR_HOME does not point to where
>> you untared solr, i.e. /opt/solr, like for every other software out there.
>> So I support such a change.
>> Will SOLR_VAR be exactly what the old SOLR_HOME was, i.e. /var/solr/data,
>> or will it point to /var/solr? It's also a bit odd how we don't (I think)
>> have a var pointing to /var/solr as laid out by the install script and in
>> Dockerfile.
>>
>> Such a change will have to happen either in 9.0 or 10.0. Sounds a tad too
>> large for 9.0, since it's not even started. But a JIRA is a good start.
>> Perhaps it is easier than we imagine, and suddenly someone have put up a
>> PR? :)
>>
>> I did not quite get where you wanted the "new" SOLR_HOME to point to. I
>> think if we should change anything, it should point to the root of the Solr
>> installation?
>>
>> Jan
>>
>> 14. jan. 2022 kl. 14:47 skrev David Smiley <ds...@apache.org>:
>>
>> I believe the root cause here is fixed by my "Immutable Infrastructure"
>> adherence proposal relating to a new SOLR_VAR:
>> https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0
>> Thus SOLR_HOME stays with the solr installation; mutable data like the
>> indexes go in a new SOLR_VAR -- ultimately the same path to the data that
>> exists today.  But since SOLR_HOME stays with Solr, so does the lib and
>> thus it's easy to mount in some other path or whatever.
>>
>> I didn't create a JIRA issue... I've been extremely busy.  But before I
>> do, WDYT about this?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <ja...@cominvent.com>
>> wrote:
>>
>>> Yep, have also been using SOLR_HOME/lib for years. But for a recent
>>> client, they needed to package up 2-3 plugin jars into the docker image, so
>>> then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a
>>> Docker volume in our Dockerfile, it won't help copying libs in that
>>> location in custom Dockerfile, since at runtime the volume location will be
>>> used instead, where some old jars would be used instead. So we added the
>>> libs to some /opt/foo/lib folder, and made an init-script in
>>> "/docker-entrypoint-initdb.d/" that on container startup would do a "rm
>>> /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/",
>>> i.e. clean up existing jars from the docker-host's existing volume and copy
>>> in the fresh plugin jars from the newest image. Phew. And the same with
>>> solr.xml initialization...
>>>
>>> Of course we could have used export SOLR_OPTS=$SOLR_OPTS
>>> -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy.
>>> So that's what the new standard location tries to solve - you load code
>>> from a stable path, not together with your data.
>>>
>>> Jan
>>>
>>> 13. jan. 2022 kl. 19:04 skrev David Smiley <ds...@apache.org>:
>>>
>>> +1 to your phasing.
>>>
>>>
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>> the classloader
>>>
>>> I'll create a JIRA :)
>>>
>>>
>>> SOLR-HOME/lib is already supported --
>>> https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html
>>> This is what I recommend people use in general.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <ho...@apache.org>
>>> wrote:
>>>
>>>> It could very well be worth shipping two docker images in the meantime.
>>>>> Or maybe a zip of each module could be a separate artifact that is
>>>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>>
>>>>
>>>> I think for 9.0 we could realistically shoot for 2 binary releases and
>>>> 2 docker images, slim (without the modules) and full-featured (with the
>>>> modules), having the full-featured be the default.
>>>>
>>>> Starting in the 9.x line, we could start packaging the modules as
>>>> separate binary artifacts for the solr release. Then in 10.x we can make
>>>> the slim release be the default (still having the fat tgz available as well
>>>> with as solr-extended-10.0.0.tgz or something like that).
>>>>
>>>>
>>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging
>>>>> fruits plugins into contribs/modules. Make it super easy to launch solr wil
>>>>> any of these on class-path (SOLR-15914
>>>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>>> Phase 2 (9.x): Evolve package manager and make it possible to
>>>>> optionally install the modules as 1st party packages instead (still fat
>>>>> distro)
>>>>> Pase 3: (10.0?): Extract even more features as modules, and publish
>>>>> all modules as separate delivery artifacts on DLCDN
>>>>>
>>>>
>>>> I really like this plan. I agree for 9.x we really don't have an
>>>> option, but to keep publishing the fat tgz as the default. Even in 10.x I
>>>> think we want to offer both a full-featured download and a slim download,
>>>> but with first-part-packages we can make slim the "default".
>>>>
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>>> the classloader
>>>>
>>>> I'll create a JIRA :)
>>>>
>>>>
>>>> Yes please. That would be a lovely improvement! People
>>>> bend-over-backward currently to add custom libs.
>>>>
>>>> - Houston
>>>>
>>>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <ja...@cominvent.com>
>>>> wrote:
>>>>
>>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>>> the classloader, similar to what we have with $SOLR_HOME/lib today. The
>>>>> disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a
>>>>> Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>>>>
>>>>> FROM solr:9.0
>>>>> ADD foo.jar /var/solr/data/lib/foo.jar
>>>>>
>>>>> ...since /var/solr/data is a volume and will resolve to the volume
>>>>> partition of the user, not the content from the image. So if we instead
>>>>> allow users to do
>>>>>
>>>>> FROM solr:9.0
>>>>> ADD foo.jar /opt/solr/lib/
>>>>>
>>>>> That is both logical and beautiful, and would always work.
>>>>>
>>>>> I'll create a JIRA :)
>>>>>
>>>>> Jan
>>>>>
>>>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
>>>>>
>>>>> There is not a lack of vision for future local and remote package
>>>>> repositories, but the story is that package mgmt development has stalled,
>>>>> and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>>>> So we have to think progress over perfection - once again
>>>>>
>>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging
>>>>> fruits plugins into contribs/modules. Make it super easy to launch solr wil
>>>>> any of these on class-path (SOLR-15914
>>>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>>> Phase 2 (9.x): Evolve package manager and make it possible to
>>>>> optionally install the modules as 1st party packages instead (still fat
>>>>> distro)
>>>>> Pase 3: (10.0?): Extract even more features as modules, and publish
>>>>> all modules as separate delivery artifacts on DLCDN
>>>>>
>>>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a
>>>>> module in e.g. 9.1 so users upgrading from 9.0 will get
>>>>> NoClassFoundException. That breaks back-compat. But perhaps we could
>>>>> continue modularization efforts in 9.x if we make sure that all new modules
>>>>> extracted in a minor release are automatically added to the classloader?
>>>>> Then the classes will disappear from solr-core.jar so would possibly break
>>>>> someone's custom embedded usecase, but 99% of users would be unaffected.
>>>>> Wdyt?
>>>>>
>>>>> In any case, I think for 9.x the realistic route is to keep our fat
>>>>> tgz, but make it slimmer by removing redundancy and prune down on the
>>>>> number of overlapping dependencies. That can get us a long way.
>>>>>
>>>>> Jan
>>>>>
>>>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
>>>>>
>>>>> Shawn:
>>>>> * RE redundancies of stuff in /dist/, see
>>>>> https://issues.apache.org/jira/browse/SOLR-15916
>>>>> * RE "contrib" vs "module" vs "package", see:
>>>>> https://issues.apache.org/jira/browse/SOLR-15917
>>>>> * RE not shipping these extras with the Solr distribution, see: "slim
>>>>> distro" mention in the document "Solr first party packages"
>>>>> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
>>>>>
>>>>> It could very well be worth shipping two docker images in the meantime.
>>>>> Or maybe a zip of each module could be a separate artifact that is
>>>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>>
>>>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org>
>>>>> wrote:
>>>>>
>>>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>>>> > I think there are lots of pieces of code in solr-core that can
>>>>>> easily be extracted the same way.
>>>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces
>>>>>> attack surface for most users as well.
>>>>>>
>>>>>> I think it would be really awesome if we had a core download that
>>>>>> only
>>>>>> included basic functionality, and all the other fancy things that
>>>>>> Solr
>>>>>> does now out of the box (as well as those that are contrib) could be
>>>>>> added after download via package scripting or just additional
>>>>>> downloads.
>>>>>>
>>>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
>>>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm,
>>>>>> 6.0.0
>>>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
>>>>>> download is so big ... and a lot of what makes it big are things that
>>>>>> the vast majority of our users will never use.
>>>>>>
>>>>>> Large reductions in the overall size of the main download would be
>>>>>> possible by putting hadoop, calcite, some of the really large lucene
>>>>>> analysis components, and the contrib stuff into packages.  The
>>>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>>>>
>>>>>> I would suggest moving zookeeper and its dependencies as well, but I
>>>>>> think we probably want SolrCloud to be part of base functionality.
>>>>>>
>>>>>> Some of the large jars are included for what are probably
>>>>>> insignificant
>>>>>> usages, and I wonder if that functionality could be replaced by newer
>>>>>> native functions available in Java 8 and later.  I am eyeballing
>>>>>> things
>>>>>> like guava and the commons-* jars here, but I am sure there are other
>>>>>> things in this category.  I'd like to eliminate as many dependencies
>>>>>> as
>>>>>> we can.
>>>>>>
>>>>>> Extracting some things from the solr-core jar into other jars sounds
>>>>>> like a really awesome idea.
>>>>>>
>>>>>> I don't think the solr-core jar should be in the dist directory.
>>>>>> It's
>>>>>> useless by itself, because it will still have a LOT of dependencies
>>>>>> even
>>>>>> if we shrink it.  And there are likely other things in the dist
>>>>>> directory that fall into that category.  The test framework and its
>>>>>> dependencies are a good candidate for removal.
>>>>>>
>>>>>> By removing some of the low-hanging fruit that I am SURE isn't needed
>>>>>> for base binary functionality on the 8.11.1 download, I was able to
>>>>>> end
>>>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a
>>>>>> little
>>>>>> bit of further reduction is possible if we can fully map out
>>>>>> dependencies.  I think we can leverage gradle to provide some
>>>>>> dependency
>>>>>> info.
>>>>>>
>>>>>> Exactly how to organize the code repo to create divided artifacts is
>>>>>> something that we would need to think about.  My initial idea is
>>>>>> changing "contrib" to "package" and then making some new directories
>>>>>> under package.
>>>>>>
>>>>>> Thanks,
>>>>>> Shawn
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>
> _______________________
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

Re: Modularizing Solr with new contrib packages

Posted by Eric Pugh <ep...@opensourceconnections.com>.
I found this a great summary!

> On Jan 16, 2022, at 9:48 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> 
> I added a developer-doc draft for modules and packages in https://github.com/apache/solr/pull/531 <https://github.com/apache/solr/pull/531> (HTML preview <https://github.com/apache/solr/blob/7eeaba318a79ed62678ab3ac5f1d403733d88e5f/dev-docs/plugins-modules-packages.adoc>). Let me know if it is useful.
> 
> Jan
> 
>> 14. jan. 2022 kl. 18:13 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>> 
>> Fair points.  I might take a stab at this on the weekend to see.
>> 
>> I propose no change to the SOLR_HOME detection logic, which will naturally end up being SOLR_INSTALL/server/solr (where solr.xml is).  Docker stuff won't need to set it / play games as it does now.
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>> 
>> On Fri, Jan 14, 2022 at 9:08 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
>> Hmm, yea it's always been a bit odd how SOLR_HOME does not point to where you untared solr, i.e. /opt/solr, like for every other software out there. So I support such a change.
>> Will SOLR_VAR be exactly what the old SOLR_HOME was, i.e. /var/solr/data, or will it point to /var/solr? It's also a bit odd how we don't (I think) have a var pointing to /var/solr as laid out by the install script and in Dockerfile.
>> 
>> Such a change will have to happen either in 9.0 or 10.0. Sounds a tad too large for 9.0, since it's not even started. But a JIRA is a good start. Perhaps it is easier than we imagine, and suddenly someone have put up a PR? :)
>> 
>> I did not quite get where you wanted the "new" SOLR_HOME to point to. I think if we should change anything, it should point to the root of the Solr installation?
>> 
>> Jan
>> 
>>> 14. jan. 2022 kl. 14:47 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>> 
>>> I believe the root cause here is fixed by my "Immutable Infrastructure" adherence proposal relating to a new SOLR_VAR:
>>> https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0 <https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0>
>>> Thus SOLR_HOME stays with the solr installation; mutable data like the indexes go in a new SOLR_VAR -- ultimately the same path to the data that exists today.  But since SOLR_HOME stays with Solr, so does the lib and thus it's easy to mount in some other path or whatever.
>>> 
>>> I didn't create a JIRA issue... I've been extremely busy.  But before I do, WDYT about this?
>>> 
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>> 
>>> On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
>>> Yep, have also been using SOLR_HOME/lib for years. But for a recent client, they needed to package up 2-3 plugin jars into the docker image, so then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a Docker volume in our Dockerfile, it won't help copying libs in that location in custom Dockerfile, since at runtime the volume location will be used instead, where some old jars would be used instead. So we added the libs to some /opt/foo/lib folder, and made an init-script in "/docker-entrypoint-initdb.d/" that on container startup would do a "rm /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/", i.e. clean up existing jars from the docker-host's existing volume and copy in the fresh plugin jars from the newest image. Phew. And the same with solr.xml initialization...
>>> 
>>> Of course we could have used export SOLR_OPTS=$SOLR_OPTS -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy. So that's what the new standard location tries to solve - you load code from a stable path, not together with your data.
>>> 
>>> Jan
>>> 
>>>> 13. jan. 2022 kl. 19:04 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>>> 
>>>> +1 to your phasing.
>>>>  
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
>>>> I'll create a JIRA :) 
>>>> 
>>>> SOLR-HOME/lib is already supported -- https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html <https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html>
>>>> This is what I recommend people use in general.
>>>> 
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>>> 
>>>> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <houston@apache.org <ma...@apache.org>> wrote:
>>>> It could very well be worth shipping two docker images in the meantime.
>>>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>>>> 
>>>> I think for 9.0 we could realistically shoot for 2 binary releases and 2 docker images, slim (without the modules) and full-featured (with the modules), having the full-featured be the default.
>>>> 
>>>> Starting in the 9.x line, we could start packaging the modules as separate binary artifacts for the solr release. Then in 10.x we can make the slim release be the default (still having the fat tgz available as well with as solr-extended-10.0.0.tgz or something like that).
>>>>  
>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>>>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>>>> 
>>>> I really like this plan. I agree for 9.x we really don't have an option, but to keep publishing the fat tgz as the default. Even in 10.x I think we want to offer both a full-featured download and a slim download, but with first-part-packages we can make slim the "default".
>>>> 
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
>>>> I'll create a JIRA :) 
>>>> 
>>>> Yes please. That would be a lovely improvement! People bend-over-backward currently to add custom libs.
>>>> 
>>>> - Houston
>>>> 
>>>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader, similar to what we have with $SOLR_HOME/lib today. The disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>>> 
>>>> FROM solr:9.0
>>>> ADD foo.jar /var/solr/data/lib/foo.jar
>>>> 
>>>> ...since /var/solr/data is a volume and will resolve to the volume partition of the user, not the content from the image. So if we instead allow users to do
>>>> 
>>>> FROM solr:9.0
>>>> ADD foo.jar /opt/solr/lib/
>>>> 
>>>> That is both logical and beautiful, and would always work.
>>>> 
>>>> I'll create a JIRA :) 
>>>> 
>>>> Jan
>>>> 
>>>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>>:
>>>>> 
>>>>> There is not a lack of vision for future local and remote package repositories, but the story is that package mgmt development has stalled, and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>>>> 
>>>>> So we have to think progress over perfection - once again
>>>>> 
>>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>>>>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>>>>> 
>>>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a module in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException. That breaks back-compat. But perhaps we could continue modularization efforts in 9.x if we make sure that all new modules extracted in a minor release are automatically added to the classloader? Then the classes will disappear from solr-core.jar so would possibly break someone's custom embedded usecase, but 99% of users would be unaffected. Wdyt?
>>>>> 
>>>>> In any case, I think for 9.x the realistic route is to keep our fat tgz, but make it slimmer by removing redundancy and prune down on the number of overlapping dependencies. That can get us a long way.
>>>>> 
>>>>> Jan
>>>>> 
>>>>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>>>>> 
>>>>>> Shawn:
>>>>>> * RE redundancies of stuff in /dist/, see https://issues.apache.org/jira/browse/SOLR-15916 <https://issues.apache.org/jira/browse/SOLR-15916>
>>>>>> * RE "contrib" vs "module" vs "package", see: https://issues.apache.org/jira/browse/SOLR-15917 <https://issues.apache.org/jira/browse/SOLR-15917>
>>>>>> * RE not shipping these extras with the Solr distribution, see: "slim distro" mention in the document "Solr first party packages" https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing <https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing>
>>>>>> 
>>>>>> It could very well be worth shipping two docker images in the meantime.
>>>>>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>>> 
>>>>>> ~ David Smiley
>>>>>> Apache Lucene/Solr Search Developer
>>>>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>>>>> 
>>>>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
>>>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>>>> > I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
>>>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.
>>>>>> 
>>>>>> I think it would be really awesome if we had a core download that only 
>>>>>> included basic functionality, and all the other fancy things that Solr 
>>>>>> does now out of the box (as well as those that are contrib) could be 
>>>>>> added after download via package scripting or just additional downloads.
>>>>>> 
>>>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
>>>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
>>>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
>>>>>> download is so big ... and a lot of what makes it big are things that 
>>>>>> the vast majority of our users will never use.
>>>>>> 
>>>>>> Large reductions in the overall size of the main download would be 
>>>>>> possible by putting hadoop, calcite, some of the really large lucene 
>>>>>> analysis components, and the contrib stuff into packages.  The 
>>>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>>>> 
>>>>>> I would suggest moving zookeeper and its dependencies as well, but I 
>>>>>> think we probably want SolrCloud to be part of base functionality.
>>>>>> 
>>>>>> Some of the large jars are included for what are probably insignificant 
>>>>>> usages, and I wonder if that functionality could be replaced by newer 
>>>>>> native functions available in Java 8 and later.  I am eyeballing things 
>>>>>> like guava and the commons-* jars here, but I am sure there are other 
>>>>>> things in this category.  I'd like to eliminate as many dependencies as 
>>>>>> we can.
>>>>>> 
>>>>>> Extracting some things from the solr-core jar into other jars sounds 
>>>>>> like a really awesome idea.
>>>>>> 
>>>>>> I don't think the solr-core jar should be in the dist directory.  It's 
>>>>>> useless by itself, because it will still have a LOT of dependencies even 
>>>>>> if we shrink it.  And there are likely other things in the dist 
>>>>>> directory that fall into that category.  The test framework and its 
>>>>>> dependencies are a good candidate for removal.
>>>>>> 
>>>>>> By removing some of the low-hanging fruit that I am SURE isn't needed 
>>>>>> for base binary functionality on the 8.11.1 download, I was able to end 
>>>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
>>>>>> bit of further reduction is possible if we can fully map out 
>>>>>> dependencies.  I think we can leverage gradle to provide some dependency 
>>>>>> info.
>>>>>> 
>>>>>> Exactly how to organize the code repo to create divided artifacts is 
>>>>>> something that we would need to think about.  My initial idea is 
>>>>>> changing "contrib" to "package" and then making some new directories 
>>>>>> under package.
>>>>>> 
>>>>>> Thanks,
>>>>>> Shawn
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
>>>>>> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Modularizing Solr with new contrib packages

Posted by Jan Høydahl <ja...@cominvent.com>.
I added a developer-doc draft for modules and packages in https://github.com/apache/solr/pull/531 (HTML preview <https://github.com/apache/solr/blob/7eeaba318a79ed62678ab3ac5f1d403733d88e5f/dev-docs/plugins-modules-packages.adoc>). Let me know if it is useful.

Jan

> 14. jan. 2022 kl. 18:13 skrev David Smiley <ds...@apache.org>:
> 
> Fair points.  I might take a stab at this on the weekend to see.
> 
> I propose no change to the SOLR_HOME detection logic, which will naturally end up being SOLR_INSTALL/server/solr (where solr.xml is).  Docker stuff won't need to set it / play games as it does now.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
> 
> On Fri, Jan 14, 2022 at 9:08 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
> Hmm, yea it's always been a bit odd how SOLR_HOME does not point to where you untared solr, i.e. /opt/solr, like for every other software out there. So I support such a change.
> Will SOLR_VAR be exactly what the old SOLR_HOME was, i.e. /var/solr/data, or will it point to /var/solr? It's also a bit odd how we don't (I think) have a var pointing to /var/solr as laid out by the install script and in Dockerfile.
> 
> Such a change will have to happen either in 9.0 or 10.0. Sounds a tad too large for 9.0, since it's not even started. But a JIRA is a good start. Perhaps it is easier than we imagine, and suddenly someone have put up a PR? :)
> 
> I did not quite get where you wanted the "new" SOLR_HOME to point to. I think if we should change anything, it should point to the root of the Solr installation?
> 
> Jan
> 
>> 14. jan. 2022 kl. 14:47 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>> 
>> I believe the root cause here is fixed by my "Immutable Infrastructure" adherence proposal relating to a new SOLR_VAR:
>> https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0 <https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0>
>> Thus SOLR_HOME stays with the solr installation; mutable data like the indexes go in a new SOLR_VAR -- ultimately the same path to the data that exists today.  But since SOLR_HOME stays with Solr, so does the lib and thus it's easy to mount in some other path or whatever.
>> 
>> I didn't create a JIRA issue... I've been extremely busy.  But before I do, WDYT about this?
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>> 
>> On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
>> Yep, have also been using SOLR_HOME/lib for years. But for a recent client, they needed to package up 2-3 plugin jars into the docker image, so then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a Docker volume in our Dockerfile, it won't help copying libs in that location in custom Dockerfile, since at runtime the volume location will be used instead, where some old jars would be used instead. So we added the libs to some /opt/foo/lib folder, and made an init-script in "/docker-entrypoint-initdb.d/" that on container startup would do a "rm /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/", i.e. clean up existing jars from the docker-host's existing volume and copy in the fresh plugin jars from the newest image. Phew. And the same with solr.xml initialization...
>> 
>> Of course we could have used export SOLR_OPTS=$SOLR_OPTS -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy. So that's what the new standard location tries to solve - you load code from a stable path, not together with your data.
>> 
>> Jan
>> 
>>> 13. jan. 2022 kl. 19:04 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>> 
>>> +1 to your phasing.
>>>  
>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
>>> I'll create a JIRA :) 
>>> 
>>> SOLR-HOME/lib is already supported -- https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html <https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html>
>>> This is what I recommend people use in general.
>>> 
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>> 
>>> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <houston@apache.org <ma...@apache.org>> wrote:
>>> It could very well be worth shipping two docker images in the meantime.
>>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>>> 
>>> I think for 9.0 we could realistically shoot for 2 binary releases and 2 docker images, slim (without the modules) and full-featured (with the modules), having the full-featured be the default.
>>> 
>>> Starting in the 9.x line, we could start packaging the modules as separate binary artifacts for the solr release. Then in 10.x we can make the slim release be the default (still having the fat tgz available as well with as solr-extended-10.0.0.tgz or something like that).
>>>  
>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>>> 
>>> I really like this plan. I agree for 9.x we really don't have an option, but to keep publishing the fat tgz as the default. Even in 10.x I think we want to offer both a full-featured download and a slim download, but with first-part-packages we can make slim the "default".
>>> 
>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
>>> I'll create a JIRA :) 
>>> 
>>> Yes please. That would be a lovely improvement! People bend-over-backward currently to add custom libs.
>>> 
>>> - Houston
>>> 
>>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader, similar to what we have with $SOLR_HOME/lib today. The disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>> 
>>> FROM solr:9.0
>>> ADD foo.jar /var/solr/data/lib/foo.jar
>>> 
>>> ...since /var/solr/data is a volume and will resolve to the volume partition of the user, not the content from the image. So if we instead allow users to do
>>> 
>>> FROM solr:9.0
>>> ADD foo.jar /opt/solr/lib/
>>> 
>>> That is both logical and beautiful, and would always work.
>>> 
>>> I'll create a JIRA :) 
>>> 
>>> Jan
>>> 
>>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>>:
>>>> 
>>>> There is not a lack of vision for future local and remote package repositories, but the story is that package mgmt development has stalled, and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>>> 
>>>> So we have to think progress over perfection - once again
>>>> 
>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>>>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>>>> 
>>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a module in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException. That breaks back-compat. But perhaps we could continue modularization efforts in 9.x if we make sure that all new modules extracted in a minor release are automatically added to the classloader? Then the classes will disappear from solr-core.jar so would possibly break someone's custom embedded usecase, but 99% of users would be unaffected. Wdyt?
>>>> 
>>>> In any case, I think for 9.x the realistic route is to keep our fat tgz, but make it slimmer by removing redundancy and prune down on the number of overlapping dependencies. That can get us a long way.
>>>> 
>>>> Jan
>>>> 
>>>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>>>> 
>>>>> Shawn:
>>>>> * RE redundancies of stuff in /dist/, see https://issues.apache.org/jira/browse/SOLR-15916 <https://issues.apache.org/jira/browse/SOLR-15916>
>>>>> * RE "contrib" vs "module" vs "package", see: https://issues.apache.org/jira/browse/SOLR-15917 <https://issues.apache.org/jira/browse/SOLR-15917>
>>>>> * RE not shipping these extras with the Solr distribution, see: "slim distro" mention in the document "Solr first party packages" https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing <https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing>
>>>>> 
>>>>> It could very well be worth shipping two docker images in the meantime.
>>>>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>> 
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>>>> 
>>>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
>>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>>> > I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
>>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.
>>>>> 
>>>>> I think it would be really awesome if we had a core download that only 
>>>>> included basic functionality, and all the other fancy things that Solr 
>>>>> does now out of the box (as well as those that are contrib) could be 
>>>>> added after download via package scripting or just additional downloads.
>>>>> 
>>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
>>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
>>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
>>>>> download is so big ... and a lot of what makes it big are things that 
>>>>> the vast majority of our users will never use.
>>>>> 
>>>>> Large reductions in the overall size of the main download would be 
>>>>> possible by putting hadoop, calcite, some of the really large lucene 
>>>>> analysis components, and the contrib stuff into packages.  The 
>>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>>> 
>>>>> I would suggest moving zookeeper and its dependencies as well, but I 
>>>>> think we probably want SolrCloud to be part of base functionality.
>>>>> 
>>>>> Some of the large jars are included for what are probably insignificant 
>>>>> usages, and I wonder if that functionality could be replaced by newer 
>>>>> native functions available in Java 8 and later.  I am eyeballing things 
>>>>> like guava and the commons-* jars here, but I am sure there are other 
>>>>> things in this category.  I'd like to eliminate as many dependencies as 
>>>>> we can.
>>>>> 
>>>>> Extracting some things from the solr-core jar into other jars sounds 
>>>>> like a really awesome idea.
>>>>> 
>>>>> I don't think the solr-core jar should be in the dist directory.  It's 
>>>>> useless by itself, because it will still have a LOT of dependencies even 
>>>>> if we shrink it.  And there are likely other things in the dist 
>>>>> directory that fall into that category.  The test framework and its 
>>>>> dependencies are a good candidate for removal.
>>>>> 
>>>>> By removing some of the low-hanging fruit that I am SURE isn't needed 
>>>>> for base binary functionality on the 8.11.1 download, I was able to end 
>>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
>>>>> bit of further reduction is possible if we can fully map out 
>>>>> dependencies.  I think we can leverage gradle to provide some dependency 
>>>>> info.
>>>>> 
>>>>> Exactly how to organize the code repo to create divided artifacts is 
>>>>> something that we would need to think about.  My initial idea is 
>>>>> changing "contrib" to "package" and then making some new directories 
>>>>> under package.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
>>>>> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
>>>>> 
>>>> 
>>> 
>> 
> 


Re: Modularizing Solr with new contrib packages

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
I wish to revive SOLR-14688, and make sure your immutable deployments are
honoured for first party packages. I'll take a stab at it over the weekend.

On Fri, 14 Jan, 2022, 10:44 pm David Smiley, <ds...@apache.org> wrote:

> Fair points.  I might take a stab at this on the weekend to see.
>
> I propose no change to the SOLR_HOME detection logic, which will naturally
> end up being SOLR_INSTALL/server/solr (where solr.xml is).  Docker stuff
> won't need to set it / play games as it does now.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jan 14, 2022 at 9:08 AM Jan Høydahl <ja...@cominvent.com> wrote:
>
>> Hmm, yea it's always been a bit odd how SOLR_HOME does not point to where
>> you untared solr, i.e. /opt/solr, like for every other software out there.
>> So I support such a change.
>> Will SOLR_VAR be exactly what the old SOLR_HOME was, i.e. /var/solr/data,
>> or will it point to /var/solr? It's also a bit odd how we don't (I think)
>> have a var pointing to /var/solr as laid out by the install script and in
>> Dockerfile.
>>
>> Such a change will have to happen either in 9.0 or 10.0. Sounds a tad too
>> large for 9.0, since it's not even started. But a JIRA is a good start.
>> Perhaps it is easier than we imagine, and suddenly someone have put up a
>> PR? :)
>>
>> I did not quite get where you wanted the "new" SOLR_HOME to point to. I
>> think if we should change anything, it should point to the root of the Solr
>> installation?
>>
>> Jan
>>
>> 14. jan. 2022 kl. 14:47 skrev David Smiley <ds...@apache.org>:
>>
>> I believe the root cause here is fixed by my "Immutable Infrastructure"
>> adherence proposal relating to a new SOLR_VAR:
>> https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0
>> Thus SOLR_HOME stays with the solr installation; mutable data like the
>> indexes go in a new SOLR_VAR -- ultimately the same path to the data that
>> exists today.  But since SOLR_HOME stays with Solr, so does the lib and
>> thus it's easy to mount in some other path or whatever.
>>
>> I didn't create a JIRA issue... I've been extremely busy.  But before I
>> do, WDYT about this?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <ja...@cominvent.com>
>> wrote:
>>
>>> Yep, have also been using SOLR_HOME/lib for years. But for a recent
>>> client, they needed to package up 2-3 plugin jars into the docker image, so
>>> then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a
>>> Docker volume in our Dockerfile, it won't help copying libs in that
>>> location in custom Dockerfile, since at runtime the volume location will be
>>> used instead, where some old jars would be used instead. So we added the
>>> libs to some /opt/foo/lib folder, and made an init-script in
>>> "/docker-entrypoint-initdb.d/" that on container startup would do a "rm
>>> /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/",
>>> i.e. clean up existing jars from the docker-host's existing volume and copy
>>> in the fresh plugin jars from the newest image. Phew. And the same with
>>> solr.xml initialization...
>>>
>>> Of course we could have used export SOLR_OPTS=$SOLR_OPTS
>>> -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy.
>>> So that's what the new standard location tries to solve - you load code
>>> from a stable path, not together with your data.
>>>
>>> Jan
>>>
>>> 13. jan. 2022 kl. 19:04 skrev David Smiley <ds...@apache.org>:
>>>
>>> +1 to your phasing.
>>>
>>>
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>> the classloader
>>>
>>> I'll create a JIRA :)
>>>
>>>
>>> SOLR-HOME/lib is already supported --
>>> https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html
>>> This is what I recommend people use in general.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <ho...@apache.org>
>>> wrote:
>>>
>>>> It could very well be worth shipping two docker images in the meantime.
>>>>> Or maybe a zip of each module could be a separate artifact that is
>>>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>>
>>>>
>>>> I think for 9.0 we could realistically shoot for 2 binary releases and
>>>> 2 docker images, slim (without the modules) and full-featured (with the
>>>> modules), having the full-featured be the default.
>>>>
>>>> Starting in the 9.x line, we could start packaging the modules as
>>>> separate binary artifacts for the solr release. Then in 10.x we can make
>>>> the slim release be the default (still having the fat tgz available as well
>>>> with as solr-extended-10.0.0.tgz or something like that).
>>>>
>>>>
>>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging
>>>>> fruits plugins into contribs/modules. Make it super easy to launch solr wil
>>>>> any of these on class-path (SOLR-15914
>>>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>>> Phase 2 (9.x): Evolve package manager and make it possible to
>>>>> optionally install the modules as 1st party packages instead (still fat
>>>>> distro)
>>>>> Pase 3: (10.0?): Extract even more features as modules, and publish
>>>>> all modules as separate delivery artifacts on DLCDN
>>>>>
>>>>
>>>> I really like this plan. I agree for 9.x we really don't have an
>>>> option, but to keep publishing the fat tgz as the default. Even in 10.x I
>>>> think we want to offer both a full-featured download and a slim download,
>>>> but with first-part-packages we can make slim the "default".
>>>>
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>>> the classloader
>>>>
>>>> I'll create a JIRA :)
>>>>
>>>>
>>>> Yes please. That would be a lovely improvement! People
>>>> bend-over-backward currently to add custom libs.
>>>>
>>>> - Houston
>>>>
>>>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <ja...@cominvent.com>
>>>> wrote:
>>>>
>>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>>> the classloader, similar to what we have with $SOLR_HOME/lib today. The
>>>>> disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a
>>>>> Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>>>>
>>>>> FROM solr:9.0
>>>>> ADD foo.jar /var/solr/data/lib/foo.jar
>>>>>
>>>>> ...since /var/solr/data is a volume and will resolve to the volume
>>>>> partition of the user, not the content from the image. So if we instead
>>>>> allow users to do
>>>>>
>>>>> FROM solr:9.0
>>>>> ADD foo.jar /opt/solr/lib/
>>>>>
>>>>> That is both logical and beautiful, and would always work.
>>>>>
>>>>> I'll create a JIRA :)
>>>>>
>>>>> Jan
>>>>>
>>>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
>>>>>
>>>>> There is not a lack of vision for future local and remote package
>>>>> repositories, but the story is that package mgmt development has stalled,
>>>>> and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>>>> So we have to think progress over perfection - once again
>>>>>
>>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging
>>>>> fruits plugins into contribs/modules. Make it super easy to launch solr wil
>>>>> any of these on class-path (SOLR-15914
>>>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>>> Phase 2 (9.x): Evolve package manager and make it possible to
>>>>> optionally install the modules as 1st party packages instead (still fat
>>>>> distro)
>>>>> Pase 3: (10.0?): Extract even more features as modules, and publish
>>>>> all modules as separate delivery artifacts on DLCDN
>>>>>
>>>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a
>>>>> module in e.g. 9.1 so users upgrading from 9.0 will get
>>>>> NoClassFoundException. That breaks back-compat. But perhaps we could
>>>>> continue modularization efforts in 9.x if we make sure that all new modules
>>>>> extracted in a minor release are automatically added to the classloader?
>>>>> Then the classes will disappear from solr-core.jar so would possibly break
>>>>> someone's custom embedded usecase, but 99% of users would be unaffected.
>>>>> Wdyt?
>>>>>
>>>>> In any case, I think for 9.x the realistic route is to keep our fat
>>>>> tgz, but make it slimmer by removing redundancy and prune down on the
>>>>> number of overlapping dependencies. That can get us a long way.
>>>>>
>>>>> Jan
>>>>>
>>>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
>>>>>
>>>>> Shawn:
>>>>> * RE redundancies of stuff in /dist/, see
>>>>> https://issues.apache.org/jira/browse/SOLR-15916
>>>>> * RE "contrib" vs "module" vs "package", see:
>>>>> https://issues.apache.org/jira/browse/SOLR-15917
>>>>> * RE not shipping these extras with the Solr distribution, see: "slim
>>>>> distro" mention in the document "Solr first party packages"
>>>>> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
>>>>>
>>>>> It could very well be worth shipping two docker images in the meantime.
>>>>> Or maybe a zip of each module could be a separate artifact that is
>>>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>>
>>>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org>
>>>>> wrote:
>>>>>
>>>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>>>> > I think there are lots of pieces of code in solr-core that can
>>>>>> easily be extracted the same way.
>>>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces
>>>>>> attack surface for most users as well.
>>>>>>
>>>>>> I think it would be really awesome if we had a core download that
>>>>>> only
>>>>>> included basic functionality, and all the other fancy things that
>>>>>> Solr
>>>>>> does now out of the box (as well as those that are contrib) could be
>>>>>> added after download via package scripting or just additional
>>>>>> downloads.
>>>>>>
>>>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
>>>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm,
>>>>>> 6.0.0
>>>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
>>>>>> download is so big ... and a lot of what makes it big are things that
>>>>>> the vast majority of our users will never use.
>>>>>>
>>>>>> Large reductions in the overall size of the main download would be
>>>>>> possible by putting hadoop, calcite, some of the really large lucene
>>>>>> analysis components, and the contrib stuff into packages.  The
>>>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>>>>
>>>>>> I would suggest moving zookeeper and its dependencies as well, but I
>>>>>> think we probably want SolrCloud to be part of base functionality.
>>>>>>
>>>>>> Some of the large jars are included for what are probably
>>>>>> insignificant
>>>>>> usages, and I wonder if that functionality could be replaced by newer
>>>>>> native functions available in Java 8 and later.  I am eyeballing
>>>>>> things
>>>>>> like guava and the commons-* jars here, but I am sure there are other
>>>>>> things in this category.  I'd like to eliminate as many dependencies
>>>>>> as
>>>>>> we can.
>>>>>>
>>>>>> Extracting some things from the solr-core jar into other jars sounds
>>>>>> like a really awesome idea.
>>>>>>
>>>>>> I don't think the solr-core jar should be in the dist directory.
>>>>>> It's
>>>>>> useless by itself, because it will still have a LOT of dependencies
>>>>>> even
>>>>>> if we shrink it.  And there are likely other things in the dist
>>>>>> directory that fall into that category.  The test framework and its
>>>>>> dependencies are a good candidate for removal.
>>>>>>
>>>>>> By removing some of the low-hanging fruit that I am SURE isn't needed
>>>>>> for base binary functionality on the 8.11.1 download, I was able to
>>>>>> end
>>>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a
>>>>>> little
>>>>>> bit of further reduction is possible if we can fully map out
>>>>>> dependencies.  I think we can leverage gradle to provide some
>>>>>> dependency
>>>>>> info.
>>>>>>
>>>>>> Exactly how to organize the code repo to create divided artifacts is
>>>>>> something that we would need to think about.  My initial idea is
>>>>>> changing "contrib" to "package" and then making some new directories
>>>>>> under package.
>>>>>>
>>>>>> Thanks,
>>>>>> Shawn
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>

Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
Fair points.  I might take a stab at this on the weekend to see.

I propose no change to the SOLR_HOME detection logic, which will naturally
end up being SOLR_INSTALL/server/solr (where solr.xml is).  Docker stuff
won't need to set it / play games as it does now.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 14, 2022 at 9:08 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Hmm, yea it's always been a bit odd how SOLR_HOME does not point to where
> you untared solr, i.e. /opt/solr, like for every other software out there.
> So I support such a change.
> Will SOLR_VAR be exactly what the old SOLR_HOME was, i.e. /var/solr/data,
> or will it point to /var/solr? It's also a bit odd how we don't (I think)
> have a var pointing to /var/solr as laid out by the install script and in
> Dockerfile.
>
> Such a change will have to happen either in 9.0 or 10.0. Sounds a tad too
> large for 9.0, since it's not even started. But a JIRA is a good start.
> Perhaps it is easier than we imagine, and suddenly someone have put up a
> PR? :)
>
> I did not quite get where you wanted the "new" SOLR_HOME to point to. I
> think if we should change anything, it should point to the root of the Solr
> installation?
>
> Jan
>
> 14. jan. 2022 kl. 14:47 skrev David Smiley <ds...@apache.org>:
>
> I believe the root cause here is fixed by my "Immutable Infrastructure"
> adherence proposal relating to a new SOLR_VAR:
> https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0
> Thus SOLR_HOME stays with the solr installation; mutable data like the
> indexes go in a new SOLR_VAR -- ultimately the same path to the data that
> exists today.  But since SOLR_HOME stays with Solr, so does the lib and
> thus it's easy to mount in some other path or whatever.
>
> I didn't create a JIRA issue... I've been extremely busy.  But before I
> do, WDYT about this?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <ja...@cominvent.com> wrote:
>
>> Yep, have also been using SOLR_HOME/lib for years. But for a recent
>> client, they needed to package up 2-3 plugin jars into the docker image, so
>> then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a
>> Docker volume in our Dockerfile, it won't help copying libs in that
>> location in custom Dockerfile, since at runtime the volume location will be
>> used instead, where some old jars would be used instead. So we added the
>> libs to some /opt/foo/lib folder, and made an init-script in
>> "/docker-entrypoint-initdb.d/" that on container startup would do a "rm
>> /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/",
>> i.e. clean up existing jars from the docker-host's existing volume and copy
>> in the fresh plugin jars from the newest image. Phew. And the same with
>> solr.xml initialization...
>>
>> Of course we could have used export SOLR_OPTS=$SOLR_OPTS
>> -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy.
>> So that's what the new standard location tries to solve - you load code
>> from a stable path, not together with your data.
>>
>> Jan
>>
>> 13. jan. 2022 kl. 19:04 skrev David Smiley <ds...@apache.org>:
>>
>> +1 to your phasing.
>>
>>
>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>> the classloader
>>
>> I'll create a JIRA :)
>>
>>
>> SOLR-HOME/lib is already supported --
>> https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html
>> This is what I recommend people use in general.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <ho...@apache.org>
>> wrote:
>>
>>> It could very well be worth shipping two docker images in the meantime.
>>>> Or maybe a zip of each module could be a separate artifact that is
>>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>
>>>
>>> I think for 9.0 we could realistically shoot for 2 binary releases and 2
>>> docker images, slim (without the modules) and full-featured (with the
>>> modules), having the full-featured be the default.
>>>
>>> Starting in the 9.x line, we could start packaging the modules as
>>> separate binary artifacts for the solr release. Then in 10.x we can make
>>> the slim release be the default (still having the fat tgz available as well
>>> with as solr-extended-10.0.0.tgz or something like that).
>>>
>>>
>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging
>>>> fruits plugins into contribs/modules. Make it super easy to launch solr wil
>>>> any of these on class-path (SOLR-15914
>>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>> Phase 2 (9.x): Evolve package manager and make it possible to
>>>> optionally install the modules as 1st party packages instead (still fat
>>>> distro)
>>>> Pase 3: (10.0?): Extract even more features as modules, and publish all
>>>> modules as separate delivery artifacts on DLCDN
>>>>
>>>
>>> I really like this plan. I agree for 9.x we really don't have an option,
>>> but to keep publishing the fat tgz as the default. Even in 10.x I think we
>>> want to offer both a full-featured download and a slim download, but with
>>> first-part-packages we can make slim the "default".
>>>
>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>> the classloader
>>>
>>> I'll create a JIRA :)
>>>
>>>
>>> Yes please. That would be a lovely improvement! People
>>> bend-over-backward currently to add custom libs.
>>>
>>> - Houston
>>>
>>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <ja...@cominvent.com>
>>> wrote:
>>>
>>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>>> the classloader, similar to what we have with $SOLR_HOME/lib today. The
>>>> disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a
>>>> Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>>>
>>>> FROM solr:9.0
>>>> ADD foo.jar /var/solr/data/lib/foo.jar
>>>>
>>>> ...since /var/solr/data is a volume and will resolve to the volume
>>>> partition of the user, not the content from the image. So if we instead
>>>> allow users to do
>>>>
>>>> FROM solr:9.0
>>>> ADD foo.jar /opt/solr/lib/
>>>>
>>>> That is both logical and beautiful, and would always work.
>>>>
>>>> I'll create a JIRA :)
>>>>
>>>> Jan
>>>>
>>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
>>>>
>>>> There is not a lack of vision for future local and remote package
>>>> repositories, but the story is that package mgmt development has stalled,
>>>> and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>>> So we have to think progress over perfection - once again
>>>>
>>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging
>>>> fruits plugins into contribs/modules. Make it super easy to launch solr wil
>>>> any of these on class-path (SOLR-15914
>>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>>> Phase 2 (9.x): Evolve package manager and make it possible to
>>>> optionally install the modules as 1st party packages instead (still fat
>>>> distro)
>>>> Pase 3: (10.0?): Extract even more features as modules, and publish all
>>>> modules as separate delivery artifacts on DLCDN
>>>>
>>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a
>>>> module in e.g. 9.1 so users upgrading from 9.0 will get
>>>> NoClassFoundException. That breaks back-compat. But perhaps we could
>>>> continue modularization efforts in 9.x if we make sure that all new modules
>>>> extracted in a minor release are automatically added to the classloader?
>>>> Then the classes will disappear from solr-core.jar so would possibly break
>>>> someone's custom embedded usecase, but 99% of users would be unaffected.
>>>> Wdyt?
>>>>
>>>> In any case, I think for 9.x the realistic route is to keep our fat
>>>> tgz, but make it slimmer by removing redundancy and prune down on the
>>>> number of overlapping dependencies. That can get us a long way.
>>>>
>>>> Jan
>>>>
>>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
>>>>
>>>> Shawn:
>>>> * RE redundancies of stuff in /dist/, see
>>>> https://issues.apache.org/jira/browse/SOLR-15916
>>>> * RE "contrib" vs "module" vs "package", see:
>>>> https://issues.apache.org/jira/browse/SOLR-15917
>>>> * RE not shipping these extras with the Solr distribution, see: "slim
>>>> distro" mention in the document "Solr first party packages"
>>>> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
>>>>
>>>> It could very well be worth shipping two docker images in the meantime.
>>>> Or maybe a zip of each module could be a separate artifact that is
>>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org>
>>>> wrote:
>>>>
>>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>>> > I think there are lots of pieces of code in solr-core that can
>>>>> easily be extracted the same way.
>>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces
>>>>> attack surface for most users as well.
>>>>>
>>>>> I think it would be really awesome if we had a core download that only
>>>>> included basic functionality, and all the other fancy things that Solr
>>>>> does now out of the box (as well as those that are contrib) could be
>>>>> added after download via package scripting or just additional
>>>>> downloads.
>>>>>
>>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
>>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm,
>>>>> 6.0.0
>>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
>>>>> download is so big ... and a lot of what makes it big are things that
>>>>> the vast majority of our users will never use.
>>>>>
>>>>> Large reductions in the overall size of the main download would be
>>>>> possible by putting hadoop, calcite, some of the really large lucene
>>>>> analysis components, and the contrib stuff into packages.  The
>>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>>>
>>>>> I would suggest moving zookeeper and its dependencies as well, but I
>>>>> think we probably want SolrCloud to be part of base functionality.
>>>>>
>>>>> Some of the large jars are included for what are probably
>>>>> insignificant
>>>>> usages, and I wonder if that functionality could be replaced by newer
>>>>> native functions available in Java 8 and later.  I am eyeballing
>>>>> things
>>>>> like guava and the commons-* jars here, but I am sure there are other
>>>>> things in this category.  I'd like to eliminate as many dependencies
>>>>> as
>>>>> we can.
>>>>>
>>>>> Extracting some things from the solr-core jar into other jars sounds
>>>>> like a really awesome idea.
>>>>>
>>>>> I don't think the solr-core jar should be in the dist directory.  It's
>>>>> useless by itself, because it will still have a LOT of dependencies
>>>>> even
>>>>> if we shrink it.  And there are likely other things in the dist
>>>>> directory that fall into that category.  The test framework and its
>>>>> dependencies are a good candidate for removal.
>>>>>
>>>>> By removing some of the low-hanging fruit that I am SURE isn't needed
>>>>> for base binary functionality on the 8.11.1 download, I was able to
>>>>> end
>>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a
>>>>> little
>>>>> bit of further reduction is possible if we can fully map out
>>>>> dependencies.  I think we can leverage gradle to provide some
>>>>> dependency
>>>>> info.
>>>>>
>>>>> Exactly how to organize the code repo to create divided artifacts is
>>>>> something that we would need to think about.  My initial idea is
>>>>> changing "contrib" to "package" and then making some new directories
>>>>> under package.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>
>

Re: Modularizing Solr with new contrib packages

Posted by Jan Høydahl <ja...@cominvent.com>.
Hmm, yea it's always been a bit odd how SOLR_HOME does not point to where you untared solr, i.e. /opt/solr, like for every other software out there. So I support such a change.
Will SOLR_VAR be exactly what the old SOLR_HOME was, i.e. /var/solr/data, or will it point to /var/solr? It's also a bit odd how we don't (I think) have a var pointing to /var/solr as laid out by the install script and in Dockerfile.

Such a change will have to happen either in 9.0 or 10.0. Sounds a tad too large for 9.0, since it's not even started. But a JIRA is a good start. Perhaps it is easier than we imagine, and suddenly someone have put up a PR? :)

I did not quite get where you wanted the "new" SOLR_HOME to point to. I think if we should change anything, it should point to the root of the Solr installation?

Jan

> 14. jan. 2022 kl. 14:47 skrev David Smiley <ds...@apache.org>:
> 
> I believe the root cause here is fixed by my "Immutable Infrastructure" adherence proposal relating to a new SOLR_VAR:
> https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0 <https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0>
> Thus SOLR_HOME stays with the solr installation; mutable data like the indexes go in a new SOLR_VAR -- ultimately the same path to the data that exists today.  But since SOLR_HOME stays with Solr, so does the lib and thus it's easy to mount in some other path or whatever.
> 
> I didn't create a JIRA issue... I've been extremely busy.  But before I do, WDYT about this?
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
> 
> On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
> Yep, have also been using SOLR_HOME/lib for years. But for a recent client, they needed to package up 2-3 plugin jars into the docker image, so then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a Docker volume in our Dockerfile, it won't help copying libs in that location in custom Dockerfile, since at runtime the volume location will be used instead, where some old jars would be used instead. So we added the libs to some /opt/foo/lib folder, and made an init-script in "/docker-entrypoint-initdb.d/" that on container startup would do a "rm /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/", i.e. clean up existing jars from the docker-host's existing volume and copy in the fresh plugin jars from the newest image. Phew. And the same with solr.xml initialization...
> 
> Of course we could have used export SOLR_OPTS=$SOLR_OPTS -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy. So that's what the new standard location tries to solve - you load code from a stable path, not together with your data.
> 
> Jan
> 
>> 13. jan. 2022 kl. 19:04 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>> 
>> +1 to your phasing.
>>  
>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
>> I'll create a JIRA :) 
>> 
>> SOLR-HOME/lib is already supported -- https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html <https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html>
>> This is what I recommend people use in general.
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>> 
>> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <houston@apache.org <ma...@apache.org>> wrote:
>> It could very well be worth shipping two docker images in the meantime.
>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>> 
>> I think for 9.0 we could realistically shoot for 2 binary releases and 2 docker images, slim (without the modules) and full-featured (with the modules), having the full-featured be the default.
>> 
>> Starting in the 9.x line, we could start packaging the modules as separate binary artifacts for the solr release. Then in 10.x we can make the slim release be the default (still having the fat tgz available as well with as solr-extended-10.0.0.tgz or something like that).
>>  
>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>> 
>> I really like this plan. I agree for 9.x we really don't have an option, but to keep publishing the fat tgz as the default. Even in 10.x I think we want to offer both a full-featured download and a slim download, but with first-part-packages we can make slim the "default".
>> 
>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
>> I'll create a JIRA :) 
>> 
>> Yes please. That would be a lovely improvement! People bend-over-backward currently to add custom libs.
>> 
>> - Houston
>> 
>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader, similar to what we have with $SOLR_HOME/lib today. The disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>> 
>> FROM solr:9.0
>> ADD foo.jar /var/solr/data/lib/foo.jar
>> 
>> ...since /var/solr/data is a volume and will resolve to the volume partition of the user, not the content from the image. So if we instead allow users to do
>> 
>> FROM solr:9.0
>> ADD foo.jar /opt/solr/lib/
>> 
>> That is both logical and beautiful, and would always work.
>> 
>> I'll create a JIRA :) 
>> 
>> Jan
>> 
>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>>:
>>> 
>>> There is not a lack of vision for future local and remote package repositories, but the story is that package mgmt development has stalled, and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>> 
>>> So we have to think progress over perfection - once again
>>> 
>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>>> 
>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a module in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException. That breaks back-compat. But perhaps we could continue modularization efforts in 9.x if we make sure that all new modules extracted in a minor release are automatically added to the classloader? Then the classes will disappear from solr-core.jar so would possibly break someone's custom embedded usecase, but 99% of users would be unaffected. Wdyt?
>>> 
>>> In any case, I think for 9.x the realistic route is to keep our fat tgz, but make it slimmer by removing redundancy and prune down on the number of overlapping dependencies. That can get us a long way.
>>> 
>>> Jan
>>> 
>>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>>> 
>>>> Shawn:
>>>> * RE redundancies of stuff in /dist/, see https://issues.apache.org/jira/browse/SOLR-15916 <https://issues.apache.org/jira/browse/SOLR-15916>
>>>> * RE "contrib" vs "module" vs "package", see: https://issues.apache.org/jira/browse/SOLR-15917 <https://issues.apache.org/jira/browse/SOLR-15917>
>>>> * RE not shipping these extras with the Solr distribution, see: "slim distro" mention in the document "Solr first party packages" https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing <https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing>
>>>> 
>>>> It could very well be worth shipping two docker images in the meantime.
>>>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>>>> 
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>>> 
>>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>> > I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.
>>>> 
>>>> I think it would be really awesome if we had a core download that only 
>>>> included basic functionality, and all the other fancy things that Solr 
>>>> does now out of the box (as well as those that are contrib) could be 
>>>> added after download via package scripting or just additional downloads.
>>>> 
>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
>>>> download is so big ... and a lot of what makes it big are things that 
>>>> the vast majority of our users will never use.
>>>> 
>>>> Large reductions in the overall size of the main download would be 
>>>> possible by putting hadoop, calcite, some of the really large lucene 
>>>> analysis components, and the contrib stuff into packages.  The 
>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>> 
>>>> I would suggest moving zookeeper and its dependencies as well, but I 
>>>> think we probably want SolrCloud to be part of base functionality.
>>>> 
>>>> Some of the large jars are included for what are probably insignificant 
>>>> usages, and I wonder if that functionality could be replaced by newer 
>>>> native functions available in Java 8 and later.  I am eyeballing things 
>>>> like guava and the commons-* jars here, but I am sure there are other 
>>>> things in this category.  I'd like to eliminate as many dependencies as 
>>>> we can.
>>>> 
>>>> Extracting some things from the solr-core jar into other jars sounds 
>>>> like a really awesome idea.
>>>> 
>>>> I don't think the solr-core jar should be in the dist directory.  It's 
>>>> useless by itself, because it will still have a LOT of dependencies even 
>>>> if we shrink it.  And there are likely other things in the dist 
>>>> directory that fall into that category.  The test framework and its 
>>>> dependencies are a good candidate for removal.
>>>> 
>>>> By removing some of the low-hanging fruit that I am SURE isn't needed 
>>>> for base binary functionality on the 8.11.1 download, I was able to end 
>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
>>>> bit of further reduction is possible if we can fully map out 
>>>> dependencies.  I think we can leverage gradle to provide some dependency 
>>>> info.
>>>> 
>>>> Exactly how to organize the code repo to create divided artifacts is 
>>>> something that we would need to think about.  My initial idea is 
>>>> changing "contrib" to "package" and then making some new directories 
>>>> under package.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
>>>> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
>>>> 
>>> 
>> 
> 


Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
I believe the root cause here is fixed by my "Immutable Infrastructure"
adherence proposal relating to a new SOLR_VAR:
https://lists.apache.org/thread/3vvld3xnndtthtl7sfgdbsgkbtpm55b0
Thus SOLR_HOME stays with the solr installation; mutable data like the
indexes go in a new SOLR_VAR -- ultimately the same path to the data that
exists today.  But since SOLR_HOME stays with Solr, so does the lib and
thus it's easy to mount in some other path or whatever.

I didn't create a JIRA issue... I've been extremely busy.  But before I do,
WDYT about this?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 14, 2022 at 4:20 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Yep, have also been using SOLR_HOME/lib for years. But for a recent
> client, they needed to package up 2-3 plugin jars into the docker image, so
> then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a
> Docker volume in our Dockerfile, it won't help copying libs in that
> location in custom Dockerfile, since at runtime the volume location will be
> used instead, where some old jars would be used instead. So we added the
> libs to some /opt/foo/lib folder, and made an init-script in
> "/docker-entrypoint-initdb.d/" that on container startup would do a "rm
> /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/",
> i.e. clean up existing jars from the docker-host's existing volume and copy
> in the fresh plugin jars from the newest image. Phew. And the same with
> solr.xml initialization...
>
> Of course we could have used export SOLR_OPTS=$SOLR_OPTS
> -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy.
> So that's what the new standard location tries to solve - you load code
> from a stable path, not together with your data.
>
> Jan
>
> 13. jan. 2022 kl. 19:04 skrev David Smiley <ds...@apache.org>:
>
> +1 to your phasing.
>
>
>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
>> classloader
>
> I'll create a JIRA :)
>
>
> SOLR-HOME/lib is already supported --
> https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html
> This is what I recommend people use in general.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <ho...@apache.org>
> wrote:
>
>> It could very well be worth shipping two docker images in the meantime.
>>> Or maybe a zip of each module could be a separate artifact that is
>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>
>>
>> I think for 9.0 we could realistically shoot for 2 binary releases and 2
>> docker images, slim (without the modules) and full-featured (with the
>> modules), having the full-featured be the default.
>>
>> Starting in the 9.x line, we could start packaging the modules as
>> separate binary artifacts for the solr release. Then in 10.x we can make
>> the slim release be the default (still having the fat tgz available as well
>> with as solr-extended-10.0.0.tgz or something like that).
>>
>>
>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits
>>> plugins into contribs/modules. Make it super easy to launch solr wil any of
>>> these on class-path (SOLR-15914
>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally
>>> install the modules as 1st party packages instead (still fat distro)
>>> Pase 3: (10.0?): Extract even more features as modules, and publish all
>>> modules as separate delivery artifacts on DLCDN
>>>
>>
>> I really like this plan. I agree for 9.x we really don't have an option,
>> but to keep publishing the fat tgz as the default. Even in 10.x I think we
>> want to offer both a full-featured download and a slim download, but with
>> first-part-packages we can make slim the "default".
>>
>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
>>> classloader
>>
>> I'll create a JIRA :)
>>
>>
>> Yes please. That would be a lovely improvement! People bend-over-backward
>> currently to add custom libs.
>>
>> - Houston
>>
>> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <ja...@cominvent.com>
>> wrote:
>>
>>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to
>>> the classloader, similar to what we have with $SOLR_HOME/lib today. The
>>> disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a
>>> Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>>
>>> FROM solr:9.0
>>> ADD foo.jar /var/solr/data/lib/foo.jar
>>>
>>> ...since /var/solr/data is a volume and will resolve to the volume
>>> partition of the user, not the content from the image. So if we instead
>>> allow users to do
>>>
>>> FROM solr:9.0
>>> ADD foo.jar /opt/solr/lib/
>>>
>>> That is both logical and beautiful, and would always work.
>>>
>>> I'll create a JIRA :)
>>>
>>> Jan
>>>
>>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
>>>
>>> There is not a lack of vision for future local and remote package
>>> repositories, but the story is that package mgmt development has stalled,
>>> and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>>> So we have to think progress over perfection - once again
>>>
>>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits
>>> plugins into contribs/modules. Make it super easy to launch solr wil any of
>>> these on class-path (SOLR-15914
>>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>>> Phase 2 (9.x): Evolve package manager and make it possible to optionally
>>> install the modules as 1st party packages instead (still fat distro)
>>> Pase 3: (10.0?): Extract even more features as modules, and publish all
>>> modules as separate delivery artifacts on DLCDN
>>>
>>> Regarding phase 2 in 9.x. We cannot really extract a feature into a
>>> module in e.g. 9.1 so users upgrading from 9.0 will get
>>> NoClassFoundException. That breaks back-compat. But perhaps we could
>>> continue modularization efforts in 9.x if we make sure that all new modules
>>> extracted in a minor release are automatically added to the classloader?
>>> Then the classes will disappear from solr-core.jar so would possibly break
>>> someone's custom embedded usecase, but 99% of users would be unaffected.
>>> Wdyt?
>>>
>>> In any case, I think for 9.x the realistic route is to keep our fat tgz,
>>> but make it slimmer by removing redundancy and prune down on the number of
>>> overlapping dependencies. That can get us a long way.
>>>
>>> Jan
>>>
>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
>>>
>>> Shawn:
>>> * RE redundancies of stuff in /dist/, see
>>> https://issues.apache.org/jira/browse/SOLR-15916
>>> * RE "contrib" vs "module" vs "package", see:
>>> https://issues.apache.org/jira/browse/SOLR-15917
>>> * RE not shipping these extras with the Solr distribution, see: "slim
>>> distro" mention in the document "Solr first party packages"
>>> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
>>>
>>> It could very well be worth shipping two docker images in the meantime.
>>> Or maybe a zip of each module could be a separate artifact that is
>>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org>
>>> wrote:
>>>
>>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>>> > I think there are lots of pieces of code in solr-core that can easily
>>>> be extracted the same way.
>>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces
>>>> attack surface for most users as well.
>>>>
>>>> I think it would be really awesome if we had a core download that only
>>>> included basic functionality, and all the other fancy things that Solr
>>>> does now out of the box (as well as those that are contrib) could be
>>>> added after download via package scripting or just additional downloads.
>>>>
>>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
>>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0
>>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
>>>> download is so big ... and a lot of what makes it big are things that
>>>> the vast majority of our users will never use.
>>>>
>>>> Large reductions in the overall size of the main download would be
>>>> possible by putting hadoop, calcite, some of the really large lucene
>>>> analysis components, and the contrib stuff into packages.  The
>>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>>
>>>> I would suggest moving zookeeper and its dependencies as well, but I
>>>> think we probably want SolrCloud to be part of base functionality.
>>>>
>>>> Some of the large jars are included for what are probably insignificant
>>>> usages, and I wonder if that functionality could be replaced by newer
>>>> native functions available in Java 8 and later.  I am eyeballing things
>>>> like guava and the commons-* jars here, but I am sure there are other
>>>> things in this category.  I'd like to eliminate as many dependencies as
>>>> we can.
>>>>
>>>> Extracting some things from the solr-core jar into other jars sounds
>>>> like a really awesome idea.
>>>>
>>>> I don't think the solr-core jar should be in the dist directory.  It's
>>>> useless by itself, because it will still have a LOT of dependencies
>>>> even
>>>> if we shrink it.  And there are likely other things in the dist
>>>> directory that fall into that category.  The test framework and its
>>>> dependencies are a good candidate for removal.
>>>>
>>>> By removing some of the low-hanging fruit that I am SURE isn't needed
>>>> for base binary functionality on the 8.11.1 download, I was able to end
>>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a
>>>> little
>>>> bit of further reduction is possible if we can fully map out
>>>> dependencies.  I think we can leverage gradle to provide some
>>>> dependency
>>>> info.
>>>>
>>>> Exactly how to organize the code repo to create divided artifacts is
>>>> something that we would need to think about.  My initial idea is
>>>> changing "contrib" to "package" and then making some new directories
>>>> under package.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>
>>>>
>>>
>>>
>

Re: Modularizing Solr with new contrib packages

Posted by Jan Høydahl <ja...@cominvent.com>.
Yep, have also been using SOLR_HOME/lib for years. But for a recent client, they needed to package up 2-3 plugin jars into the docker image, so then we tried $SOLR_HOME/lib, but since /var/solr/data is defined as a Docker volume in our Dockerfile, it won't help copying libs in that location in custom Dockerfile, since at runtime the volume location will be used instead, where some old jars would be used instead. So we added the libs to some /opt/foo/lib folder, and made an init-script in "/docker-entrypoint-initdb.d/" that on container startup would do a "rm /var/solr/data/lib/*.jar && cp /opt/foo/lib/*.jar /var/solr/data/lib/", i.e. clean up existing jars from the docker-host's existing volume and copy in the fresh plugin jars from the newest image. Phew. And the same with solr.xml initialization...

Of course we could have used export SOLR_OPTS=$SOLR_OPTS -Dsolr.sharedLib=/opt/foo/lib or something, but it is still not super easy. So that's what the new standard location tries to solve - you load code from a stable path, not together with your data.

Jan

> 13. jan. 2022 kl. 19:04 skrev David Smiley <ds...@apache.org>:
> 
> +1 to your phasing.
>  
> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
> I'll create a JIRA :) 
> 
> SOLR-HOME/lib is already supported -- https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html <https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html>
> This is what I recommend people use in general.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
> 
> On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <houston@apache.org <ma...@apache.org>> wrote:
> It could very well be worth shipping two docker images in the meantime.
> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
> 
> I think for 9.0 we could realistically shoot for 2 binary releases and 2 docker images, slim (without the modules) and full-featured (with the modules), having the full-featured be the default.
> 
> Starting in the 9.x line, we could start packaging the modules as separate binary artifacts for the solr release. Then in 10.x we can make the slim release be the default (still having the fat tgz available as well with as solr-extended-10.0.0.tgz or something like that).
>  
> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
> 
> I really like this plan. I agree for 9.x we really don't have an option, but to keep publishing the fat tgz as the default. Even in 10.x I think we want to offer both a full-featured download and a slim download, but with first-part-packages we can make slim the "default".
> 
> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader
> I'll create a JIRA :) 
> 
> Yes please. That would be a lovely improvement! People bend-over-backward currently to add custom libs.
> 
> - Houston
> 
> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader, similar to what we have with $SOLR_HOME/lib today. The disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a Docker volume or a different disk, so you cannot e.g make a Dockerfile like
> 
> FROM solr:9.0
> ADD foo.jar /var/solr/data/lib/foo.jar
> 
> ...since /var/solr/data is a volume and will resolve to the volume partition of the user, not the content from the image. So if we instead allow users to do
> 
> FROM solr:9.0
> ADD foo.jar /opt/solr/lib/
> 
> That is both logical and beautiful, and would always work.
> 
> I'll create a JIRA :) 
> 
> Jan
> 
>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>>:
>> 
>> There is not a lack of vision for future local and remote package repositories, but the story is that package mgmt development has stalled, and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>> 
>> So we have to think progress over perfection - once again
>> 
>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
>> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
>> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
>> 
>> Regarding phase 2 in 9.x. We cannot really extract a feature into a module in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException. That breaks back-compat. But perhaps we could continue modularization efforts in 9.x if we make sure that all new modules extracted in a minor release are automatically added to the classloader? Then the classes will disappear from solr-core.jar so would possibly break someone's custom embedded usecase, but 99% of users would be unaffected. Wdyt?
>> 
>> In any case, I think for 9.x the realistic route is to keep our fat tgz, but make it slimmer by removing redundancy and prune down on the number of overlapping dependencies. That can get us a long way.
>> 
>> Jan
>> 
>>> 13. jan. 2022 kl. 03:15 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>>> 
>>> Shawn:
>>> * RE redundancies of stuff in /dist/, see https://issues.apache.org/jira/browse/SOLR-15916 <https://issues.apache.org/jira/browse/SOLR-15916>
>>> * RE "contrib" vs "module" vs "package", see: https://issues.apache.org/jira/browse/SOLR-15917 <https://issues.apache.org/jira/browse/SOLR-15917>
>>> * RE not shipping these extras with the Solr distribution, see: "slim distro" mention in the document "Solr first party packages" https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing <https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing>
>>> 
>>> It could very well be worth shipping two docker images in the meantime.
>>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>>> 
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>>> 
>>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>> > I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.
>>> 
>>> I think it would be really awesome if we had a core download that only 
>>> included basic functionality, and all the other fancy things that Solr 
>>> does now out of the box (as well as those that are contrib) could be 
>>> added after download via package scripting or just additional downloads.
>>> 
>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
>>> download is so big ... and a lot of what makes it big are things that 
>>> the vast majority of our users will never use.
>>> 
>>> Large reductions in the overall size of the main download would be 
>>> possible by putting hadoop, calcite, some of the really large lucene 
>>> analysis components, and the contrib stuff into packages.  The 
>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>> 
>>> I would suggest moving zookeeper and its dependencies as well, but I 
>>> think we probably want SolrCloud to be part of base functionality.
>>> 
>>> Some of the large jars are included for what are probably insignificant 
>>> usages, and I wonder if that functionality could be replaced by newer 
>>> native functions available in Java 8 and later.  I am eyeballing things 
>>> like guava and the commons-* jars here, but I am sure there are other 
>>> things in this category.  I'd like to eliminate as many dependencies as 
>>> we can.
>>> 
>>> Extracting some things from the solr-core jar into other jars sounds 
>>> like a really awesome idea.
>>> 
>>> I don't think the solr-core jar should be in the dist directory.  It's 
>>> useless by itself, because it will still have a LOT of dependencies even 
>>> if we shrink it.  And there are likely other things in the dist 
>>> directory that fall into that category.  The test framework and its 
>>> dependencies are a good candidate for removal.
>>> 
>>> By removing some of the low-hanging fruit that I am SURE isn't needed 
>>> for base binary functionality on the 8.11.1 download, I was able to end 
>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
>>> bit of further reduction is possible if we can fully map out 
>>> dependencies.  I think we can leverage gradle to provide some dependency 
>>> info.
>>> 
>>> Exactly how to organize the code repo to create divided artifacts is 
>>> something that we would need to think about.  My initial idea is 
>>> changing "contrib" to "package" and then making some new directories 
>>> under package.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
>>> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
>>> 
>> 
> 


Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
+1 to your phasing.


> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
> classloader

I'll create a JIRA :)


SOLR-HOME/lib is already supported --
https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/libs.html
This is what I recommend people use in general.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Jan 13, 2022 at 10:59 AM Houston Putman <ho...@apache.org> wrote:

> It could very well be worth shipping two docker images in the meantime.
>> Or maybe a zip of each module could be a separate artifact that is
>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>
>
> I think for 9.0 we could realistically shoot for 2 binary releases and 2
> docker images, slim (without the modules) and full-featured (with the
> modules), having the full-featured be the default.
>
> Starting in the 9.x line, we could start packaging the modules as separate
> binary artifacts for the solr release. Then in 10.x we can make the slim
> release be the default (still having the fat tgz available as well with as
> solr-extended-10.0.0.tgz or something like that).
>
>
>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits
>> plugins into contribs/modules. Make it super easy to launch solr wil any of
>> these on class-path (SOLR-15914
>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>> Phase 2 (9.x): Evolve package manager and make it possible to optionally
>> install the modules as 1st party packages instead (still fat distro)
>> Pase 3: (10.0?): Extract even more features as modules, and publish all
>> modules as separate delivery artifacts on DLCDN
>>
>
> I really like this plan. I agree for 9.x we really don't have an option,
> but to keep publishing the fat tgz as the default. Even in 10.x I think we
> want to offer both a full-featured download and a slim download, but with
> first-part-packages we can make slim the "default".
>
> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
>> classloader
>
> I'll create a JIRA :)
>
>
> Yes please. That would be a lovely improvement! People bend-over-backward
> currently to add custom libs.
>
> - Houston
>
> On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <ja...@cominvent.com> wrote:
>
>> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
>> classloader, similar to what we have with $SOLR_HOME/lib today. The
>> disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a
>> Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>>
>> FROM solr:9.0
>> ADD foo.jar /var/solr/data/lib/foo.jar
>>
>> ...since /var/solr/data is a volume and will resolve to the volume
>> partition of the user, not the content from the image. So if we instead
>> allow users to do
>>
>> FROM solr:9.0
>> ADD foo.jar /opt/solr/lib/
>>
>> That is both logical and beautiful, and would always work.
>>
>> I'll create a JIRA :)
>>
>> Jan
>>
>> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
>>
>> There is not a lack of vision for future local and remote package
>> repositories, but the story is that package mgmt development has stalled,
>> and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
>> So we have to think progress over perfection - once again
>>
>> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits
>> plugins into contribs/modules. Make it super easy to launch solr wil any of
>> these on class-path (SOLR-15914
>> <https://issues.apache.org/jira/browse/SOLR-15914>).
>> Phase 2 (9.x): Evolve package manager and make it possible to optionally
>> install the modules as 1st party packages instead (still fat distro)
>> Pase 3: (10.0?): Extract even more features as modules, and publish all
>> modules as separate delivery artifacts on DLCDN
>>
>> Regarding phase 2 in 9.x. We cannot really extract a feature into a
>> module in e.g. 9.1 so users upgrading from 9.0 will get
>> NoClassFoundException. That breaks back-compat. But perhaps we could
>> continue modularization efforts in 9.x if we make sure that all new modules
>> extracted in a minor release are automatically added to the classloader?
>> Then the classes will disappear from solr-core.jar so would possibly break
>> someone's custom embedded usecase, but 99% of users would be unaffected.
>> Wdyt?
>>
>> In any case, I think for 9.x the realistic route is to keep our fat tgz,
>> but make it slimmer by removing redundancy and prune down on the number of
>> overlapping dependencies. That can get us a long way.
>>
>> Jan
>>
>> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
>>
>> Shawn:
>> * RE redundancies of stuff in /dist/, see
>> https://issues.apache.org/jira/browse/SOLR-15916
>> * RE "contrib" vs "module" vs "package", see:
>> https://issues.apache.org/jira/browse/SOLR-15917
>> * RE not shipping these extras with the Solr distribution, see: "slim
>> distro" mention in the document "Solr first party packages"
>> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
>>
>> It could very well be worth shipping two docker images in the meantime.
>> Or maybe a zip of each module could be a separate artifact that is
>> published?  I'm not sure what freedoms we have to do this in the ASF.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org> wrote:
>>
>>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>>> > I think there are lots of pieces of code in solr-core that can easily
>>> be extracted the same way.
>>> > Some perhaps even for 9.0.0, as it slims down the core and reduces
>>> attack surface for most users as well.
>>>
>>> I think it would be really awesome if we had a core download that only
>>> included basic functionality, and all the other fancy things that Solr
>>> does now out of the box (as well as those that are contrib) could be
>>> added after download via package scripting or just additional downloads.
>>>
>>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
>>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0
>>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
>>> download is so big ... and a lot of what makes it big are things that
>>> the vast majority of our users will never use.
>>>
>>> Large reductions in the overall size of the main download would be
>>> possible by putting hadoop, calcite, some of the really large lucene
>>> analysis components, and the contrib stuff into packages.  The
>>> extraction contrib alone is 43.5MiB compressed in zip format.
>>>
>>> I would suggest moving zookeeper and its dependencies as well, but I
>>> think we probably want SolrCloud to be part of base functionality.
>>>
>>> Some of the large jars are included for what are probably insignificant
>>> usages, and I wonder if that functionality could be replaced by newer
>>> native functions available in Java 8 and later.  I am eyeballing things
>>> like guava and the commons-* jars here, but I am sure there are other
>>> things in this category.  I'd like to eliminate as many dependencies as
>>> we can.
>>>
>>> Extracting some things from the solr-core jar into other jars sounds
>>> like a really awesome idea.
>>>
>>> I don't think the solr-core jar should be in the dist directory.  It's
>>> useless by itself, because it will still have a LOT of dependencies even
>>> if we shrink it.  And there are likely other things in the dist
>>> directory that fall into that category.  The test framework and its
>>> dependencies are a good candidate for removal.
>>>
>>> By removing some of the low-hanging fruit that I am SURE isn't needed
>>> for base binary functionality on the 8.11.1 download, I was able to end
>>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little
>>> bit of further reduction is possible if we can fully map out
>>> dependencies.  I think we can leverage gradle to provide some dependency
>>> info.
>>>
>>> Exactly how to organize the code repo to create divided artifacts is
>>> something that we would need to think about.  My initial idea is
>>> changing "contrib" to "package" and then making some new directories
>>> under package.
>>>
>>> Thanks,
>>> Shawn
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>
>>>
>>
>>

Re: Modularizing Solr with new contrib packages

Posted by Houston Putman <ho...@apache.org>.
>
> It could very well be worth shipping two docker images in the meantime.
> Or maybe a zip of each module could be a separate artifact that is
> published?  I'm not sure what freedoms we have to do this in the ASF.
>

I think for 9.0 we could realistically shoot for 2 binary releases and 2
docker images, slim (without the modules) and full-featured (with the
modules), having the full-featured be the default.

Starting in the 9.x line, we could start packaging the modules as separate
binary artifacts for the solr release. Then in 10.x we can make the slim
release be the default (still having the fat tgz available as well with as
solr-extended-10.0.0.tgz or something like that).


> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits
> plugins into contribs/modules. Make it super easy to launch solr wil any of
> these on class-path (SOLR-15914
> <https://issues.apache.org/jira/browse/SOLR-15914>).
> Phase 2 (9.x): Evolve package manager and make it possible to optionally
> install the modules as 1st party packages instead (still fat distro)
> Pase 3: (10.0?): Extract even more features as modules, and publish all
> modules as separate delivery artifacts on DLCDN
>

I really like this plan. I agree for 9.x we really don't have an option,
but to keep publishing the fat tgz as the default. Even in 10.x I think we
want to offer both a full-featured download and a slim download, but with
first-part-packages we can make slim the "default".

Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
> classloader

I'll create a JIRA :)


Yes please. That would be a lovely improvement! People bend-over-backward
currently to add custom libs.

- Houston

On Thu, Jan 13, 2022 at 8:09 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the
> classloader, similar to what we have with $SOLR_HOME/lib today. The
> disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a
> Docker volume or a different disk, so you cannot e.g make a Dockerfile like
>
> FROM solr:9.0
> ADD foo.jar /var/solr/data/lib/foo.jar
>
> ...since /var/solr/data is a volume and will resolve to the volume
> partition of the user, not the content from the image. So if we instead
> allow users to do
>
> FROM solr:9.0
> ADD foo.jar /opt/solr/lib/
>
> That is both logical and beautiful, and would always work.
>
> I'll create a JIRA :)
>
> Jan
>
> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
>
> There is not a lack of vision for future local and remote package
> repositories, but the story is that package mgmt development has stalled,
> and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
> So we have to think progress over perfection - once again
>
> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits
> plugins into contribs/modules. Make it super easy to launch solr wil any of
> these on class-path (SOLR-15914
> <https://issues.apache.org/jira/browse/SOLR-15914>).
> Phase 2 (9.x): Evolve package manager and make it possible to optionally
> install the modules as 1st party packages instead (still fat distro)
> Pase 3: (10.0?): Extract even more features as modules, and publish all
> modules as separate delivery artifacts on DLCDN
>
> Regarding phase 2 in 9.x. We cannot really extract a feature into a module
> in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException.
> That breaks back-compat. But perhaps we could continue modularization
> efforts in 9.x if we make sure that all new modules extracted in a minor
> release are automatically added to the classloader? Then the classes will
> disappear from solr-core.jar so would possibly break someone's custom
> embedded usecase, but 99% of users would be unaffected. Wdyt?
>
> In any case, I think for 9.x the realistic route is to keep our fat tgz,
> but make it slimmer by removing redundancy and prune down on the number of
> overlapping dependencies. That can get us a long way.
>
> Jan
>
> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
>
> Shawn:
> * RE redundancies of stuff in /dist/, see
> https://issues.apache.org/jira/browse/SOLR-15916
> * RE "contrib" vs "module" vs "package", see:
> https://issues.apache.org/jira/browse/SOLR-15917
> * RE not shipping these extras with the Solr distribution, see: "slim
> distro" mention in the document "Solr first party packages"
> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
>
> It could very well be worth shipping two docker images in the meantime.
> Or maybe a zip of each module could be a separate artifact that is
> published?  I'm not sure what freedoms we have to do this in the ASF.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>> > I think there are lots of pieces of code in solr-core that can easily
>> be extracted the same way.
>> > Some perhaps even for 9.0.0, as it slims down the core and reduces
>> attack surface for most users as well.
>>
>> I think it would be really awesome if we had a core download that only
>> included basic functionality, and all the other fancy things that Solr
>> does now out of the box (as well as those that are contrib) could be
>> added after download via package scripting or just additional downloads.
>>
>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0
>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
>> download is so big ... and a lot of what makes it big are things that
>> the vast majority of our users will never use.
>>
>> Large reductions in the overall size of the main download would be
>> possible by putting hadoop, calcite, some of the really large lucene
>> analysis components, and the contrib stuff into packages.  The
>> extraction contrib alone is 43.5MiB compressed in zip format.
>>
>> I would suggest moving zookeeper and its dependencies as well, but I
>> think we probably want SolrCloud to be part of base functionality.
>>
>> Some of the large jars are included for what are probably insignificant
>> usages, and I wonder if that functionality could be replaced by newer
>> native functions available in Java 8 and later.  I am eyeballing things
>> like guava and the commons-* jars here, but I am sure there are other
>> things in this category.  I'd like to eliminate as many dependencies as
>> we can.
>>
>> Extracting some things from the solr-core jar into other jars sounds
>> like a really awesome idea.
>>
>> I don't think the solr-core jar should be in the dist directory.  It's
>> useless by itself, because it will still have a LOT of dependencies even
>> if we shrink it.  And there are likely other things in the dist
>> directory that fall into that category.  The test framework and its
>> dependencies are a good candidate for removal.
>>
>> By removing some of the low-hanging fruit that I am SURE isn't needed
>> for base binary functionality on the 8.11.1 download, I was able to end
>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little
>> bit of further reduction is possible if we can fully map out
>> dependencies.  I think we can leverage gradle to provide some dependency
>> info.
>>
>> Exactly how to organize the code repo to create divided artifacts is
>> something that we would need to think about.  My initial idea is
>> changing "contrib" to "package" and then making some new directories
>> under package.
>>
>> Thanks,
>> Shawn
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>
>
>

Re: Modularizing Solr with new contrib packages

Posted by Jan Høydahl <ja...@cominvent.com>.
Another minor improvement for users is if we pre-add $SOLR_TIP/lib to the classloader, similar to what we have with $SOLR_HOME/lib today. The disadvantage of $SOLR_HOME/lib is that it can be anywhere, perhaps on a Docker volume or a different disk, so you cannot e.g make a Dockerfile like

FROM solr:9.0
ADD foo.jar /var/solr/data/lib/foo.jar

...since /var/solr/data is a volume and will resolve to the volume partition of the user, not the content from the image. So if we instead allow users to do

FROM solr:9.0
ADD foo.jar /opt/solr/lib/

That is both logical and beautiful, and would always work.

I'll create a JIRA :) 

Jan

> 13. jan. 2022 kl. 13:57 skrev Jan Høydahl <ja...@cominvent.com>:
> 
> There is not a lack of vision for future local and remote package repositories, but the story is that package mgmt development has stalled, and is out of reach for 1st party pkgs in the 9.0.0 timeframe.
> 
> So we have to think progress over perfection - once again
> 
> Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
> Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
> Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN
> 
> Regarding phase 2 in 9.x. We cannot really extract a feature into a module in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException. That breaks back-compat. But perhaps we could continue modularization efforts in 9.x if we make sure that all new modules extracted in a minor release are automatically added to the classloader? Then the classes will disappear from solr-core.jar so would possibly break someone's custom embedded usecase, but 99% of users would be unaffected. Wdyt?
> 
> In any case, I think for 9.x the realistic route is to keep our fat tgz, but make it slimmer by removing redundancy and prune down on the number of overlapping dependencies. That can get us a long way.
> 
> Jan
> 
>> 13. jan. 2022 kl. 03:15 skrev David Smiley <dsmiley@apache.org <ma...@apache.org>>:
>> 
>> Shawn:
>> * RE redundancies of stuff in /dist/, see https://issues.apache.org/jira/browse/SOLR-15916 <https://issues.apache.org/jira/browse/SOLR-15916>
>> * RE "contrib" vs "module" vs "package", see: https://issues.apache.org/jira/browse/SOLR-15917 <https://issues.apache.org/jira/browse/SOLR-15917>
>> * RE not shipping these extras with the Solr distribution, see: "slim distro" mention in the document "Solr first party packages" https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing <https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing>
>> 
>> It could very well be worth shipping two docker images in the meantime.
>> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>> 
>> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
>> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
>> > I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
>> > Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.
>> 
>> I think it would be really awesome if we had a core download that only 
>> included basic functionality, and all the other fancy things that Solr 
>> does now out of the box (as well as those that are contrib) could be 
>> added after download via package scripting or just additional downloads.
>> 
>> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
>> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
>> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
>> download is so big ... and a lot of what makes it big are things that 
>> the vast majority of our users will never use.
>> 
>> Large reductions in the overall size of the main download would be 
>> possible by putting hadoop, calcite, some of the really large lucene 
>> analysis components, and the contrib stuff into packages.  The 
>> extraction contrib alone is 43.5MiB compressed in zip format.
>> 
>> I would suggest moving zookeeper and its dependencies as well, but I 
>> think we probably want SolrCloud to be part of base functionality.
>> 
>> Some of the large jars are included for what are probably insignificant 
>> usages, and I wonder if that functionality could be replaced by newer 
>> native functions available in Java 8 and later.  I am eyeballing things 
>> like guava and the commons-* jars here, but I am sure there are other 
>> things in this category.  I'd like to eliminate as many dependencies as 
>> we can.
>> 
>> Extracting some things from the solr-core jar into other jars sounds 
>> like a really awesome idea.
>> 
>> I don't think the solr-core jar should be in the dist directory.  It's 
>> useless by itself, because it will still have a LOT of dependencies even 
>> if we shrink it.  And there are likely other things in the dist 
>> directory that fall into that category.  The test framework and its 
>> dependencies are a good candidate for removal.
>> 
>> By removing some of the low-hanging fruit that I am SURE isn't needed 
>> for base binary functionality on the 8.11.1 download, I was able to end 
>> up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
>> bit of further reduction is possible if we can fully map out 
>> dependencies.  I think we can leverage gradle to provide some dependency 
>> info.
>> 
>> Exactly how to organize the code repo to create divided artifacts is 
>> something that we would need to think about.  My initial idea is 
>> changing "contrib" to "package" and then making some new directories 
>> under package.
>> 
>> Thanks,
>> Shawn
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
>> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
>> 
> 


Re: Modularizing Solr with new contrib packages

Posted by Jan Høydahl <ja...@cominvent.com>.
There is not a lack of vision for future local and remote package repositories, but the story is that package mgmt development has stalled, and is out of reach for 1st party pkgs in the 9.0.0 timeframe.

So we have to think progress over perfection - once again

Phase 1. (9.0): Modularize Solr by extracting obvious low hanging fruits plugins into contribs/modules. Make it super easy to launch solr wil any of these on class-path (SOLR-15914 <https://issues.apache.org/jira/browse/SOLR-15914>).
Phase 2 (9.x): Evolve package manager and make it possible to optionally install the modules as 1st party packages instead (still fat distro)
Pase 3: (10.0?): Extract even more features as modules, and publish all modules as separate delivery artifacts on DLCDN

Regarding phase 2 in 9.x. We cannot really extract a feature into a module in e.g. 9.1 so users upgrading from 9.0 will get NoClassFoundException. That breaks back-compat. But perhaps we could continue modularization efforts in 9.x if we make sure that all new modules extracted in a minor release are automatically added to the classloader? Then the classes will disappear from solr-core.jar so would possibly break someone's custom embedded usecase, but 99% of users would be unaffected. Wdyt?

In any case, I think for 9.x the realistic route is to keep our fat tgz, but make it slimmer by removing redundancy and prune down on the number of overlapping dependencies. That can get us a long way.

Jan

> 13. jan. 2022 kl. 03:15 skrev David Smiley <ds...@apache.org>:
> 
> Shawn:
> * RE redundancies of stuff in /dist/, see https://issues.apache.org/jira/browse/SOLR-15916 <https://issues.apache.org/jira/browse/SOLR-15916>
> * RE "contrib" vs "module" vs "package", see: https://issues.apache.org/jira/browse/SOLR-15917 <https://issues.apache.org/jira/browse/SOLR-15917>
> * RE not shipping these extras with the Solr distribution, see: "slim distro" mention in the document "Solr first party packages" https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing <https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing>
> 
> It could very well be worth shipping two docker images in the meantime.
> Or maybe a zip of each module could be a separate artifact that is published?  I'm not sure what freedoms we have to do this in the ASF.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
> 
> On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
> > I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
> > Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.
> 
> I think it would be really awesome if we had a core download that only 
> included basic functionality, and all the other fancy things that Solr 
> does now out of the box (as well as those that are contrib) could be 
> added after download via package scripting or just additional downloads.
> 
> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
> download is so big ... and a lot of what makes it big are things that 
> the vast majority of our users will never use.
> 
> Large reductions in the overall size of the main download would be 
> possible by putting hadoop, calcite, some of the really large lucene 
> analysis components, and the contrib stuff into packages.  The 
> extraction contrib alone is 43.5MiB compressed in zip format.
> 
> I would suggest moving zookeeper and its dependencies as well, but I 
> think we probably want SolrCloud to be part of base functionality.
> 
> Some of the large jars are included for what are probably insignificant 
> usages, and I wonder if that functionality could be replaced by newer 
> native functions available in Java 8 and later.  I am eyeballing things 
> like guava and the commons-* jars here, but I am sure there are other 
> things in this category.  I'd like to eliminate as many dependencies as 
> we can.
> 
> Extracting some things from the solr-core jar into other jars sounds 
> like a really awesome idea.
> 
> I don't think the solr-core jar should be in the dist directory.  It's 
> useless by itself, because it will still have a LOT of dependencies even 
> if we shrink it.  And there are likely other things in the dist 
> directory that fall into that category.  The test framework and its 
> dependencies are a good candidate for removal.
> 
> By removing some of the low-hanging fruit that I am SURE isn't needed 
> for base binary functionality on the 8.11.1 download, I was able to end 
> up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
> bit of further reduction is possible if we can fully map out 
> dependencies.  I think we can leverage gradle to provide some dependency 
> info.
> 
> Exactly how to organize the code repo to create divided artifacts is 
> something that we would need to think about.  My initial idea is 
> changing "contrib" to "package" and then making some new directories 
> under package.
> 
> Thanks,
> Shawn
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org <ma...@solr.apache.org>
> For additional commands, e-mail: dev-help@solr.apache.org <ma...@solr.apache.org>
> 


Re: Modularizing Solr with new contrib packages

Posted by David Smiley <ds...@apache.org>.
Shawn:
* RE redundancies of stuff in /dist/, see
https://issues.apache.org/jira/browse/SOLR-15916
* RE "contrib" vs "module" vs "package", see:
https://issues.apache.org/jira/browse/SOLR-15917
* RE not shipping these extras with the Solr distribution, see: "slim
distro" mention in the document "Solr first party packages"
https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing

It could very well be worth shipping two docker images in the meantime.
Or maybe a zip of each module could be a separate artifact that is
published?  I'm not sure what freedoms we have to do this in the ASF.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jan 12, 2022 at 8:21 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/12/2022 8:31 AM, Jan Høydahl wrote:
> > I think there are lots of pieces of code in solr-core that can easily be
> extracted the same way.
> > Some perhaps even for 9.0.0, as it slims down the core and reduces
> attack surface for most users as well.
>
> I think it would be really awesome if we had a core download that only
> included basic functionality, and all the other fancy things that Solr
> does now out of the box (as well as those that are contrib) could be
> added after download via package scripting or just additional downloads.
>
> The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip
> version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0
> was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the
> download is so big ... and a lot of what makes it big are things that
> the vast majority of our users will never use.
>
> Large reductions in the overall size of the main download would be
> possible by putting hadoop, calcite, some of the really large lucene
> analysis components, and the contrib stuff into packages.  The
> extraction contrib alone is 43.5MiB compressed in zip format.
>
> I would suggest moving zookeeper and its dependencies as well, but I
> think we probably want SolrCloud to be part of base functionality.
>
> Some of the large jars are included for what are probably insignificant
> usages, and I wonder if that functionality could be replaced by newer
> native functions available in Java 8 and later.  I am eyeballing things
> like guava and the commons-* jars here, but I am sure there are other
> things in this category.  I'd like to eliminate as many dependencies as
> we can.
>
> Extracting some things from the solr-core jar into other jars sounds
> like a really awesome idea.
>
> I don't think the solr-core jar should be in the dist directory.  It's
> useless by itself, because it will still have a LOT of dependencies even
> if we shrink it.  And there are likely other things in the dist
> directory that fall into that category.  The test framework and its
> dependencies are a good candidate for removal.
>
> By removing some of the low-hanging fruit that I am SURE isn't needed
> for base binary functionality on the 8.11.1 download, I was able to end
> up with a .zip file sized in at 60.4MiB, and I am sure at least a little
> bit of further reduction is possible if we can fully map out
> dependencies.  I think we can leverage gradle to provide some dependency
> info.
>
> Exactly how to organize the code repo to create divided artifacts is
> something that we would need to think about.  My initial idea is
> changing "contrib" to "package" and then making some new directories
> under package.
>
> Thanks,
> Shawn
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: Modularizing Solr with new contrib packages

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/12/2022 8:31 AM, Jan Høydahl wrote:
> I think there are lots of pieces of code in solr-core that can easily be extracted the same way.
> Some perhaps even for 9.0.0, as it slims down the core and reduces attack surface for most users as well.

I think it would be really awesome if we had a core download that only 
included basic functionality, and all the other fancy things that Solr 
does now out of the box (as well as those that are contrib) could be 
added after download via package scripting or just additional downloads.

The size of solr-8.11.1.tgz is 207MiB, or 218076598 bytes.  The .zip 
version is slightly larger.  8.0.0 was 163MiB, 7.0.0 was 142MiBm, 6.0.0 
was 131MiB, and 1.4.1 was 53.7MiB.  I think it's insane that the 
download is so big ... and a lot of what makes it big are things that 
the vast majority of our users will never use.

Large reductions in the overall size of the main download would be 
possible by putting hadoop, calcite, some of the really large lucene 
analysis components, and the contrib stuff into packages.  The 
extraction contrib alone is 43.5MiB compressed in zip format.

I would suggest moving zookeeper and its dependencies as well, but I 
think we probably want SolrCloud to be part of base functionality.

Some of the large jars are included for what are probably insignificant 
usages, and I wonder if that functionality could be replaced by newer 
native functions available in Java 8 and later.  I am eyeballing things 
like guava and the commons-* jars here, but I am sure there are other 
things in this category.  I'd like to eliminate as many dependencies as 
we can.

Extracting some things from the solr-core jar into other jars sounds 
like a really awesome idea.

I don't think the solr-core jar should be in the dist directory.  It's 
useless by itself, because it will still have a LOT of dependencies even 
if we shrink it.  And there are likely other things in the dist 
directory that fall into that category.  The test framework and its 
dependencies are a good candidate for removal.

By removing some of the low-hanging fruit that I am SURE isn't needed 
for base binary functionality on the 8.11.1 download, I was able to end 
up with a .zip file sized in at 60.4MiB, and I am sure at least a little 
bit of further reduction is possible if we can fully map out 
dependencies.  I think we can leverage gradle to provide some dependency 
info.

Exactly how to organize the code repo to create divided artifacts is 
something that we would need to think about.  My initial idea is 
changing "contrib" to "package" and then making some new directories 
under package.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org