You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Demian Katz <de...@villanova.edu> on 2016/07/28 19:29:47 UTC

Installing Solr as a dependency

Hello,

I develop an open source project (https://github.com/vufind-org/vufind) that depends on Solr, and I'm trying to figure out if there is a better way to manage the Solr dependency.

Presently, I simply bundle Solr with my software by committing the latest distribution to my Git repo. Over time, having all of these large binaries is causing repository bloat and slow Git performance. I'm beginning to wonder whether there's a better way. With the rise in the popularity of dependency managers like NPM and Composer, it seems like it might be nice to somehow be able to declare Solr as a dependency and have it installed automatically on the client side rather than bundling the whole gigantic application by hand... however, as far as I can tell, there's no way to do this presently (at least, not unless you count specialized niche projects like https://github.com/projecthydra/hydra-jetty, which are not exactly what I'm looking for).

Just curious if others are dealing with this problem in other ways, or if there are any tool-based approaches that I haven't discovered on my own.

thanks,
Demian

RE: Installing Solr as a dependency

Posted by Demian Katz <de...@villanova.edu>.
I did think about Maven, but (probably because I'm a Maven newbie) I didn't find an obvious way to do it and figured that Maven was meant more for libraries than for complete applications. In any case, your answer gives me more to work with, so I'll do some experimentation. Thanks!

- Demian
________________________________________
From: Daniel Collins [danwcollins@gmail.com]
Sent: Friday, July 29, 2016 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Installing Solr as a dependency

Can't you use Maven?  I thought that was the standard dependency management
tool, and Solr is published to Maven repos.  There used to be a solr
artifact which was the WAR file, but presumably now, you'd have to pull down

  <groupId>org.apache.solr</groupId>
  <artifactId>solr-parent</artifactId>

and maybe then start that up.

We have an internal application which is dependent on solr-core, (its a
web-app, we embed bits of Solr basically) and maven works fine for us.  We
do patch and build Solr internally though to our own corporate maven repos,
so that helps :)  But I've done it outside the corporate environment and
found recent Solr releases on standard maven repo sites.


On 29 July 2016 at 15:12, Shawn Heisey <ap...@elyograg.org> wrote:

> On 7/28/2016 1:29 PM, Demian Katz wrote:
> > I develop an open source project
> > (https://github.com/vufind-org/vufind) that depends on Solr, and I'm
> > trying to figure out if there is a better way to manage the Solr
> > dependency. Presently, I simply bundle Solr with my software by
> > committing the latest distribution to my Git repo. Over time, having
> > all of these large binaries is causing repository bloat and slow Git
> > performance. I'm beginning to wonder whether there's a better way.
> > With the rise in the popularity of dependency managers like NPM and
> > Composer, it seems like it might be nice to somehow be able to declare
> > Solr as a dependency and have it installed automatically on the client
> > side rather than bundling the whole gigantic application by hand...
> > however, as far as I can tell, there's no way to do this presently (at
> > least, not unless you count specialized niche projects like
> > https://github.com/projecthydra/hydra-jetty, which are not exactly
> > what I'm looking for). Just curious if others are dealing with this
> > problem in other ways, or if there are any tool-based approaches that
> > I haven't discovered on my own.
>
> I wouldn't include Solr in my own project at all.  I would probably
> request that the user download the binary artifact and put it in a
> predictable location, and configure my installation script to do the
> download if the file is not there.  I would strongly recommend taking
> advantage of Apache's mirror system for that download -- although if you
> need a specific version of Solr, you will find that the mirror system
> only has the latest version, and you must go to the Apache Archives for
> older versions.
>
> To reduce load on the Apache Archive, you could place a copy of the
> binary on your own download servers ... and you could probably greatly
> reduce the size of that download by stripping out components that your
> software doesn't need.  If users want to enable additional
> functionality, they would be free to download the full Solr binary from
> Apache.
>
> I once discovered that if optional components are removed (including
> some jars in the webapp), the Solr download drops from 150+ MB to about
> 25 MB.
>
> https://issues.apache.org/jira/browse/SOLR-6806
>
> Thanks,
> Shawn
>
>

Re: Installing Solr as a dependency

Posted by Daniel Collins <da...@gmail.com>.
Can't you use Maven?  I thought that was the standard dependency management
tool, and Solr is published to Maven repos.  There used to be a solr
artifact which was the WAR file, but presumably now, you'd have to pull down

  <groupId>org.apache.solr</groupId>
  <artifactId>solr-parent</artifactId>

and maybe then start that up.

We have an internal application which is dependent on solr-core, (its a
web-app, we embed bits of Solr basically) and maven works fine for us.  We
do patch and build Solr internally though to our own corporate maven repos,
so that helps :)  But I've done it outside the corporate environment and
found recent Solr releases on standard maven repo sites.


On 29 July 2016 at 15:12, Shawn Heisey <ap...@elyograg.org> wrote:

> On 7/28/2016 1:29 PM, Demian Katz wrote:
> > I develop an open source project
> > (https://github.com/vufind-org/vufind) that depends on Solr, and I'm
> > trying to figure out if there is a better way to manage the Solr
> > dependency. Presently, I simply bundle Solr with my software by
> > committing the latest distribution to my Git repo. Over time, having
> > all of these large binaries is causing repository bloat and slow Git
> > performance. I'm beginning to wonder whether there's a better way.
> > With the rise in the popularity of dependency managers like NPM and
> > Composer, it seems like it might be nice to somehow be able to declare
> > Solr as a dependency and have it installed automatically on the client
> > side rather than bundling the whole gigantic application by hand...
> > however, as far as I can tell, there's no way to do this presently (at
> > least, not unless you count specialized niche projects like
> > https://github.com/projecthydra/hydra-jetty, which are not exactly
> > what I'm looking for). Just curious if others are dealing with this
> > problem in other ways, or if there are any tool-based approaches that
> > I haven't discovered on my own.
>
> I wouldn't include Solr in my own project at all.  I would probably
> request that the user download the binary artifact and put it in a
> predictable location, and configure my installation script to do the
> download if the file is not there.  I would strongly recommend taking
> advantage of Apache's mirror system for that download -- although if you
> need a specific version of Solr, you will find that the mirror system
> only has the latest version, and you must go to the Apache Archives for
> older versions.
>
> To reduce load on the Apache Archive, you could place a copy of the
> binary on your own download servers ... and you could probably greatly
> reduce the size of that download by stripping out components that your
> software doesn't need.  If users want to enable additional
> functionality, they would be free to download the full Solr binary from
> Apache.
>
> I once discovered that if optional components are removed (including
> some jars in the webapp), the Solr download drops from 150+ MB to about
> 25 MB.
>
> https://issues.apache.org/jira/browse/SOLR-6806
>
> Thanks,
> Shawn
>
>

RE: Installing Solr as a dependency

Posted by Demian Katz <de...@villanova.edu>.
Thanks -- another interesting possibility, though I suppose the disadvantage to this strategy would be the dependency on Docker, which could be problematic for some users (especially those running Windows, where I understand that this could only be achieved with virtualization, which would almost certainly impact performance). Still, another option to put on the table!

- Demian

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Friday, July 29, 2016 8:02 PM
To: solr-user
Subject: Re: Installing Solr as a dependency

What about (not tried) pulling down an official Docker build and adding your stuff to that?
https://hub.docker.com/_/solr/

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 30 July 2016 at 03:03, Demian Katz <de...@villanova.edu> wrote:
>> I wouldn't include Solr in my own project at all.  I would probably 
>> request that the user download the binary artifact and put it in a 
>> predictable location, and configure my installation script to do the 
>> download if the file is not there.  I would strongly recommend taking 
>> advantage of Apache's mirror system for that download -- although if 
>> you need a specific version of Solr, you will find that the mirror 
>> system only has the latest version, and you must go to the Apache 
>> Archives for older versions.
>>
>> To reduce load on the Apache Archive, you could place a copy of the 
>> binary on your own download servers ... and you could probably 
>> greatly reduce the size of that download by stripping out components 
>> that your software doesn't need.  If users want to enable additional 
>> functionality, they would be free to download the full Solr binary 
>> from Apache.
>
> Yes, this is the reason I was hoping to use some sort of dependency management tool. The idea of downloading from Apache's system has definitely crossed my mind, but it's inherently more fragile than using a dependency manager (since Apache is at least theoretically free to change their URL structure, etc., at any time) and, as you say, it seemed impolite to direct potentially heavy amounts of traffic to Apache servers (especially when you consider that every commit to my project triggers one or more continuous integration builds, each of which would need to perform the download). Creating a project-specific mirror also crossed my mind, but that has its own set of problems: it's work to maintain it, and the server hosting it needs to be able to withstand the high traffic that would otherwise be directed at Apache. The idea of a theoretical dependency management tool still feels more attractive because it adds a standard, unchanging mechanism for obtaining specific versions of the software and it offers the possibility of local package caching across builds to significantly reduce the amount of HTTP traffic back and forth. Of course, it's a lot less attractive if it proves to be only theory and not in fact practically achievable -- I'll play around with Maven next week and see where that gets me.
>
> Anyway, I don't say any of that to dismiss your suggestions -- you 
> present potentially viable possibilities, and I'll certainly keep 
> those ideas on the table as I plan for the future -- but I thought it 
> might be worthwhile to share my thinking. :-)
>
>> I once discovered that if optional components are removed (including 
>> some jars in the webapp), the Solr download drops from 150+ MB to 
>> about
>> 25 MB.
>>
>> https://issues.apache.org/jira/browse/SOLR-6806
>
> This could actually be a separate argument for a dependency-management-based Solr structure, in that you could create a core solr package with minimum content that could recommend a whole array of optional dependencies. A script could then be used to build different versions of the download package from these -- one with just the core, one with all the optional stuff included. Those who wanted some intermediate number of files could be encouraged to manually create their desired build from packages.
>
> But again, I freely admit that everything I'm saying is based on 
> experience with package managers outside the realm of Java -- I need 
> to learn more about Maven (and perhaps Ivy) before I can make any 
> particularly intelligent statements about what is really possible in 
> this context. :-)
>
> - Demian

Re: Installing Solr as a dependency

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
What about (not tried) pulling down an official Docker build and
adding your stuff to that?
https://hub.docker.com/_/solr/

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 30 July 2016 at 03:03, Demian Katz <de...@villanova.edu> wrote:
>> I wouldn't include Solr in my own project at all.  I would probably
>> request that the user download the binary artifact and put it in a
>> predictable location, and configure my installation script to do the
>> download if the file is not there.  I would strongly recommend taking
>> advantage of Apache's mirror system for that download -- although if you
>> need a specific version of Solr, you will find that the mirror system
>> only has the latest version, and you must go to the Apache Archives for
>> older versions.
>>
>> To reduce load on the Apache Archive, you could place a copy of the
>> binary on your own download servers ... and you could probably greatly
>> reduce the size of that download by stripping out components that your
>> software doesn't need.  If users want to enable additional
>> functionality, they would be free to download the full Solr binary from
>> Apache.
>
> Yes, this is the reason I was hoping to use some sort of dependency management tool. The idea of downloading from Apache's system has definitely crossed my mind, but it's inherently more fragile than using a dependency manager (since Apache is at least theoretically free to change their URL structure, etc., at any time) and, as you say, it seemed impolite to direct potentially heavy amounts of traffic to Apache servers (especially when you consider that every commit to my project triggers one or more continuous integration builds, each of which would need to perform the download). Creating a project-specific mirror also crossed my mind, but that has its own set of problems: it's work to maintain it, and the server hosting it needs to be able to withstand the high traffic that would otherwise be directed at Apache. The idea of a theoretical dependency management tool still feels more attractive because it adds a standard, unchanging mechanism for obtaining specific versions of the software and it offers the possibility of local package caching across builds to significantly reduce the amount of HTTP traffic back and forth. Of course, it's a lot less attractive if it proves to be only theory and not in fact practically achievable -- I'll play around with Maven next week and see where that gets me.
>
> Anyway, I don't say any of that to dismiss your suggestions -- you present potentially viable possibilities, and I'll certainly keep those ideas on the table as I plan for the future -- but I thought it might be worthwhile to share my thinking. :-)
>
>> I once discovered that if optional components are removed (including
>> some jars in the webapp), the Solr download drops from 150+ MB to about
>> 25 MB.
>>
>> https://issues.apache.org/jira/browse/SOLR-6806
>
> This could actually be a separate argument for a dependency-management-based Solr structure, in that you could create a core solr package with minimum content that could recommend a whole array of optional dependencies. A script could then be used to build different versions of the download package from these -- one with just the core, one with all the optional stuff included. Those who wanted some intermediate number of files could be encouraged to manually create their desired build from packages.
>
> But again, I freely admit that everything I'm saying is based on experience with package managers outside the realm of Java -- I need to learn more about Maven (and perhaps Ivy) before I can make any particularly intelligent statements about what is really possible in this context. :-)
>
> - Demian

RE: Installing Solr as a dependency

Posted by Demian Katz <de...@villanova.edu>.
> I wouldn't include Solr in my own project at all.  I would probably
> request that the user download the binary artifact and put it in a
> predictable location, and configure my installation script to do the
> download if the file is not there.  I would strongly recommend taking
> advantage of Apache's mirror system for that download -- although if you
> need a specific version of Solr, you will find that the mirror system
> only has the latest version, and you must go to the Apache Archives for
> older versions.
>
> To reduce load on the Apache Archive, you could place a copy of the
> binary on your own download servers ... and you could probably greatly
> reduce the size of that download by stripping out components that your
> software doesn't need.  If users want to enable additional
> functionality, they would be free to download the full Solr binary from
> Apache.

Yes, this is the reason I was hoping to use some sort of dependency management tool. The idea of downloading from Apache's system has definitely crossed my mind, but it's inherently more fragile than using a dependency manager (since Apache is at least theoretically free to change their URL structure, etc., at any time) and, as you say, it seemed impolite to direct potentially heavy amounts of traffic to Apache servers (especially when you consider that every commit to my project triggers one or more continuous integration builds, each of which would need to perform the download). Creating a project-specific mirror also crossed my mind, but that has its own set of problems: it's work to maintain it, and the server hosting it needs to be able to withstand the high traffic that would otherwise be directed at Apache. The idea of a theoretical dependency management tool still feels more attractive because it adds a standard, unchanging mechanism for obtaining specific versions of the software and it offers the possibility of local package caching across builds to significantly reduce the amount of HTTP traffic back and forth. Of course, it's a lot less attractive if it proves to be only theory and not in fact practically achievable -- I'll play around with Maven next week and see where that gets me.

Anyway, I don't say any of that to dismiss your suggestions -- you present potentially viable possibilities, and I'll certainly keep those ideas on the table as I plan for the future -- but I thought it might be worthwhile to share my thinking. :-)

> I once discovered that if optional components are removed (including
> some jars in the webapp), the Solr download drops from 150+ MB to about
> 25 MB.
>
> https://issues.apache.org/jira/browse/SOLR-6806

This could actually be a separate argument for a dependency-management-based Solr structure, in that you could create a core solr package with minimum content that could recommend a whole array of optional dependencies. A script could then be used to build different versions of the download package from these -- one with just the core, one with all the optional stuff included. Those who wanted some intermediate number of files could be encouraged to manually create their desired build from packages.

But again, I freely admit that everything I'm saying is based on experience with package managers outside the realm of Java -- I need to learn more about Maven (and perhaps Ivy) before I can make any particularly intelligent statements about what is really possible in this context. :-)

- Demian

Re: Installing Solr as a dependency

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/28/2016 1:29 PM, Demian Katz wrote:
> I develop an open source project
> (https://github.com/vufind-org/vufind) that depends on Solr, and I'm
> trying to figure out if there is a better way to manage the Solr
> dependency. Presently, I simply bundle Solr with my software by
> committing the latest distribution to my Git repo. Over time, having
> all of these large binaries is causing repository bloat and slow Git
> performance. I'm beginning to wonder whether there's a better way.
> With the rise in the popularity of dependency managers like NPM and
> Composer, it seems like it might be nice to somehow be able to declare
> Solr as a dependency and have it installed automatically on the client
> side rather than bundling the whole gigantic application by hand...
> however, as far as I can tell, there's no way to do this presently (at
> least, not unless you count specialized niche projects like
> https://github.com/projecthydra/hydra-jetty, which are not exactly
> what I'm looking for). Just curious if others are dealing with this
> problem in other ways, or if there are any tool-based approaches that
> I haven't discovered on my own.

I wouldn't include Solr in my own project at all.  I would probably
request that the user download the binary artifact and put it in a
predictable location, and configure my installation script to do the
download if the file is not there.  I would strongly recommend taking
advantage of Apache's mirror system for that download -- although if you
need a specific version of Solr, you will find that the mirror system
only has the latest version, and you must go to the Apache Archives for
older versions.

To reduce load on the Apache Archive, you could place a copy of the
binary on your own download servers ... and you could probably greatly
reduce the size of that download by stripping out components that your
software doesn't need.  If users want to enable additional
functionality, they would be free to download the full Solr binary from
Apache.

I once discovered that if optional components are removed (including
some jars in the webapp), the Solr download drops from 150+ MB to about
25 MB.

https://issues.apache.org/jira/browse/SOLR-6806

Thanks,
Shawn