You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Shalin Shekhar Mangar <sh...@gmail.com> on 2009/04/13 19:56:50 UTC
Make ant example faster
Hello,
As part of SOLR-934, I'd like to setup an example for indexing mail boxes
with the existing example/example-DIH demo. I see that ant example has a
dependency on example-contrib. Do we want to do that? I vaguely remember
Yonik complaining about the time ant example takes.
For setting up the MailEntityProcessor, I'd have to copy mail, activation
and tika jars to example-DIH/solr/mail/lib, which will make it extra slow.
How about we remove the dependency to example-contrib and keep it as an
independent target?
--
Regards,
Shalin Shekhar Mangar.
Re: Make ant example faster
Posted by Ryan McKinley <ry...@gmail.com>.
On Apr 22, 2009, at 12:20 PM, Erik Hatcher wrote:
> I was aiming simple... like some simple tweaks to
> SolrResourceLoader, at least a way to allow plugins to all live
> separately and wired into a single Solr instance without copying
> files and such.
>
> What would it take to wire in OSGI (I know nothing about it)?
>
From my brief experience with OSGi, I don't think it is something we
can easily tack on to our existing structure. However it is something
we should definitely consider for 2.0
I think extending SolrResourceLoader is a good option for 1.4
ryan
> Erik
>
>
> On Apr 22, 2009, at 12:18 PM, Grant Ingersoll wrote:
>
>> Even better, is probably something like OSGI where we can make sure
>> that we have some level of isolation between the class loaders so
>> that we can have different versions of different JARs w/o breaking
>> the application. Since it is clear that Solr is entering into a
>> "contrib" phase, it is only a matter of time before we start having
>> version clashes between libraries.
>>
>> On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:
>>
>>> Wouldn't one solution to this bundling and aggregating/separating
>>> of examples and plugins be made a lot less painful if
>>> SolrResourceLoader could load from a list of directories rather
>>> than only a single directory? What are the negatives to adding
>>> that support? Let's keep solr.war lean and mean, with all
>>> extensions simply appended to a list of JAR containing directories?
>>>
>>> I know, we're recreating a container of sorts, but we already got
>>> SolrResourceLoader, so maybe just some tweaks there can make
>>> example bundling a lot more pleasurable?
>>>
>>> Erik
>>>
>>
>
Re: Make ant example faster
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I was aiming simple... like some simple tweaks to SolrResourceLoader,
at least a way to allow plugins to all live separately and wired into
a single Solr instance without copying files and such.
What would it take to wire in OSGI (I know nothing about it)?
Erik
On Apr 22, 2009, at 12:18 PM, Grant Ingersoll wrote:
> Even better, is probably something like OSGI where we can make sure
> that we have some level of isolation between the class loaders so
> that we can have different versions of different JARs w/o breaking
> the application. Since it is clear that Solr is entering into a
> "contrib" phase, it is only a matter of time before we start having
> version clashes between libraries.
>
> On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:
>
>> Wouldn't one solution to this bundling and aggregating/separating
>> of examples and plugins be made a lot less painful if
>> SolrResourceLoader could load from a list of directories rather
>> than only a single directory? What are the negatives to adding
>> that support? Let's keep solr.war lean and mean, with all
>> extensions simply appended to a list of JAR containing directories?
>>
>> I know, we're recreating a container of sorts, but we already got
>> SolrResourceLoader, so maybe just some tweaks there can make
>> example bundling a lot more pleasurable?
>>
>> Erik
>>
>
Re: Make ant example faster
Posted by Grant Ingersoll <gs...@apache.org>.
Even better, is probably something like OSGI where we can make sure
that we have some level of isolation between the class loaders so that
we can have different versions of different JARs w/o breaking the
application. Since it is clear that Solr is entering into a "contrib"
phase, it is only a matter of time before we start having version
clashes between libraries.
On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:
> Wouldn't one solution to this bundling and aggregating/separating of
> examples and plugins be made a lot less painful if
> SolrResourceLoader could load from a list of directories rather than
> only a single directory? What are the negatives to adding that
> support? Let's keep solr.war lean and mean, with all extensions
> simply appended to a list of JAR containing directories?
>
> I know, we're recreating a container of sorts, but we already got
> SolrResourceLoader, so maybe just some tweaks there can make example
> bundling a lot more pleasurable?
>
> Erik
>
Re: Make ant example faster
Posted by Chris Hostetter <ho...@fucit.org>.
: Wouldn't one solution to this bundling and aggregating/separating of examples
: and plugins be made a lot less painful if SolrResourceLoader could load from a
: list of directories rather than only a single directory? What are the
I'm not understanding how that would help the example situation. what
are you envisioning that the instanceDir would look like? how
would SolrResourceLoader know which directories to use?
right now SolrResourceLoader assumes (instanceDir + "lib/") will contain a
bunch of jars ... i can imagine that we could let that directory contain
other directories and walk it recursively looking for jars, and then
people could put symlinks in it to other lib directories -- but how would
that help us with the example? would we create the symlinks via ant? can
tgz/zip files store symlinks efficiently?
Or are you thinking that we would add a new way to specify additional lib
dir paths in the solrconfig.xml? ... i suppose that would be posible, but
i think it would require some funky changes to SolrConfig and Config to
parse out the lib dirs before parsing anything else (that would need to
kow about the SolrResourceLoader)
-Hoss
Re: Make ant example faster
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Wouldn't one solution to this bundling and aggregating/separating of
examples and plugins be made a lot less painful if SolrResourceLoader
could load from a list of directories rather than only a single
directory? What are the negatives to adding that support? Let's
keep solr.war lean and mean, with all extensions simply appended to a
list of JAR containing directories?
I know, we're recreating a container of sorts, but we already got
SolrResourceLoader, so maybe just some tweaks there can make example
bundling a lot more pleasurable?
Erik
Re: Make ant example faster
Posted by Chris Hostetter <ho...@fucit.org>.
: > assuming we have more use-case specific examples, wouldn't that just be
: > something that copies one of them to a target directory?
:
: I guess what I really want is a way to be able to say: Give me a Solr home
: that has these X features (DIH, Solr Cell, spell checking, highlighting, plus
: whatever libs are needed) with some basic configuration + my choice of a
: schema ranging from one that is barebones (maybe just an "id" field defined)
: to a "full fledged" one (the current example) and I want to be able to do it
: as simple as possible (i.e. as few commands as possible).
Ah.... i'm understanding now. you don't just want a lot of good
micro-examples of each feature, you want an easy way to generate "default"
configs that work for an arbitrary set of features specified by the user.
That seems like a hard problem to get right in a generic way.
The simplest method i can think of for achiving that would be to start
with a kitchen-sink type example that includes *everything* (because then
it's easy to test that all of the pieces work well together and don't
collide -- duplicate fieldnames or hanler names etc...) and then use xml
comments or some other templating to be able to split that kitchen sink
file up into snippets -- which could then be combined again in lots of
combinations.
(Or ... I suppose the snippets could be maintained by hand and then the
build system could generate the kitchen sink and run tests to ensure that
none of them collide ... but maintaining the kitchen-sink by hand seems
easier in a weird way)
-Hoss
Re: Make ant example faster
Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 20, 2009, at 5:45 PM, Chris Hostetter wrote:
>
> : Fair enough. FWIW, I'd still like to be able to generate a Solr
> container
> : from an example (i.e. "minimal" or "DIH" or whatever)
>
> by "container" do you mean a Solr home with configs and neccessary
> libs
> ready to be tweaked to suite your purposes?
>
> assuming we have more use-case specific examples, wouldn't that just
> be
> something that copies one of them to a target directory?
I guess what I really want is a way to be able to say: Give me a Solr
home that has these X features (DIH, Solr Cell, spell checking,
highlighting, plus whatever libs are needed) with some basic
configuration + my choice of a schema ranging from one that is
barebones (maybe just an "id" field defined) to a "full fledged" one
(the current example) and I want to be able to do it as simple as
possible (i.e. as few commands as possible).
-Grant
Re: Make ant example faster
Posted by Chris Hostetter <ho...@fucit.org>.
: Fair enough. FWIW, I'd still like to be able to generate a Solr container
: from an example (i.e. "minimal" or "DIH" or whatever)
by "container" do you mean a Solr home with configs and neccessary libs
ready to be tweaked to suite your purposes?
assuming we have more use-case specific examples, wouldn't that just be
something that copies one of them to a target directory?
-Hoss
Re: Make ant example faster
Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 16, 2009, at 7:37 PM, Chris Hostetter wrote:
>
> : It is similar, indeed, but I think it results in there only ever
> being one
> : active Solr example and the user need not worry about setting solr
> home.
>
> Hmmm... this seems like a bad idea.
>
> we want to make sure that *users* who have downloaded Solr can run
> all of
> the examples without needing ant ... having a single "active"
> example and
> using an ant target to change it would mean that if i install solr and
> then go through the tutorial (using the tutorial example), i would
> need to
> (understand and run) ant to see the DIH example.
>
>
> It seems like it would make a lot more sense to have lots of
> examples and
> let the user set the solr home to try them out -- that's very easy
> to do.
>
> I'm not sure i really understand the concern about how long "ant
> example"
> takes ... it's a build time task, and it only take ~15 seconds on my
> box
> if everything is up to date (if everything isn't up todate then
> compilation is going to take much longer then what "example"
> does) ... the
> longest contributor to the time seems to be contrib/javascript's
> "docs"
> target -- but i'm guessing some ant tricks to check directory mod
> times
> before runing the jsrun.jar could shave that off as well.
Fair enough. FWIW, I'd still like to be able to generate a Solr
container from an example (i.e. "minimal" or "DIH" or whatever)
Re: Make ant example faster
Posted by Chris Hostetter <ho...@fucit.org>.
: It is similar, indeed, but I think it results in there only ever being one
: active Solr example and the user need not worry about setting solr home.
Hmmm... this seems like a bad idea.
we want to make sure that *users* who have downloaded Solr can run all of
the examples without needing ant ... having a single "active" example and
using an ant target to change it would mean that if i install solr and
then go through the tutorial (using the tutorial example), i would need to
(understand and run) ant to see the DIH example.
It seems like it would make a lot more sense to have lots of examples and
let the user set the solr home to try them out -- that's very easy to do.
I'm not sure i really understand the concern about how long "ant example"
takes ... it's a build time task, and it only take ~15 seconds on my box
if everything is up to date (if everything isn't up todate then
compilation is going to take much longer then what "example" does) ... the
longest contributor to the time seems to be contrib/javascript's "docs"
target -- but i'm guessing some ant tricks to check directory mod times
before runing the jsrun.jar could shave that off as well.
-Hoss
Re: Make ant example faster
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Apr 14, 2009 at 4:25 AM, Grant Ingersoll <gs...@apache.org>wrote:
>
> On Apr 13, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:
>
>>
>> Isn't this the same as the current setup with the name of the directory
>> changed and different ant targets to set them up? The new ant target will
>> setup the default solr instance to be 'extraction' or 'dih' or
>> 'clustering'
>> and avoid the need to type -Dsolr.solr.home.
>>
>
>
> It is similar, indeed, but I think it results in there only ever being one
> active Solr example and the user need not worry about setting solr home.
>
+1
Lets do it.
--
Regards,
Shalin Shekhar Mangar.
Re: Make ant example faster
Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 13, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:
> On Tue, Apr 14, 2009 at 12:33 AM, Grant Ingersoll
> <gs...@apache.org>wrote:
>
>>
>> Instead of a kitchen-sink example directory, we "revert" it back to
>> being
>> the tutorial example. It still can get built by ant example, but
>> ultimately
>> we "deprecate" it (more later).
>>
>> Then, as a replacement, we create a directory containing what I
>> would call
>> Solr Templates, which contain subdirectories named appropriately
>> for the
>> kind of example. Rather than explain, I'll give an example:
>>
>> The templates directory would contain the configurations (i.e.
>> schema.xml
>> and solrconfig.xml) and any sample docs (but not the libraries) for:
>> tutorial - The current tutorial example
>> dih - The DIH example
>> extraction - Solr Cell example
>> geo - geo spatial example (once 773 is committed)
>> clustering - once SOLR-769 is committed
>> simple - A barebones schema and config (mainly used for
>> bootstrapping a new project for experienced users)
>> exploratory - Basically, the same as simple, but the schema
>> defines
>> a single dynamic field - Think of Hoss's Solr Out of the Box talk
>> from
>> ApacheCon whereby you want to quickly explore a new data set
>> without having
>> to define a schema.
>> [other] -
>>
>> Note, the templates directory could also live under each contrib,
>> but it
>> isn't necessarily a 1-1 thing (e.g. simple and exploratory
>> templates are not
>> contrib-specific).
>>
>> Then, typing "ant example" would copy the necessary tutorial stuff
>> to the
>> example directory (which still contains the Jetty stuff) but would
>> not have
>> to recurse into any of the contribs.
>>
>> Typing "ant example -Dtype=clustering" would copy the clustering
>> requirements, plus go to contrib/clustering (or whatever) and get the
>> appropriate material such that the example directory. Similarly
>> for any of
>> the other "templates"
>>
>
> Isn't this the same as the current setup with the name of the
> directory
> changed and different ant targets to set them up? The new ant target
> will
> setup the default solr instance to be 'extraction' or 'dih' or
> 'clustering'
> and avoid the need to type -Dsolr.solr.home.
It is similar, indeed, but I think it results in there only ever being
one active Solr example and the user need not worry about setting solr
home.
-Grant
Re: Make ant example faster
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Apr 14, 2009 at 12:33 AM, Grant Ingersoll <gs...@apache.org>wrote:
>
> Instead of a kitchen-sink example directory, we "revert" it back to being
> the tutorial example. It still can get built by ant example, but ultimately
> we "deprecate" it (more later).
>
> Then, as a replacement, we create a directory containing what I would call
> Solr Templates, which contain subdirectories named appropriately for the
> kind of example. Rather than explain, I'll give an example:
>
> The templates directory would contain the configurations (i.e. schema.xml
> and solrconfig.xml) and any sample docs (but not the libraries) for:
> tutorial - The current tutorial example
> dih - The DIH example
> extraction - Solr Cell example
> geo - geo spatial example (once 773 is committed)
> clustering - once SOLR-769 is committed
> simple - A barebones schema and config (mainly used for
> bootstrapping a new project for experienced users)
> exploratory - Basically, the same as simple, but the schema defines
> a single dynamic field - Think of Hoss's Solr Out of the Box talk from
> ApacheCon whereby you want to quickly explore a new data set without having
> to define a schema.
> [other] -
>
> Note, the templates directory could also live under each contrib, but it
> isn't necessarily a 1-1 thing (e.g. simple and exploratory templates are not
> contrib-specific).
>
> Then, typing "ant example" would copy the necessary tutorial stuff to the
> example directory (which still contains the Jetty stuff) but would not have
> to recurse into any of the contribs.
>
> Typing "ant example -Dtype=clustering" would copy the clustering
> requirements, plus go to contrib/clustering (or whatever) and get the
> appropriate material such that the example directory. Similarly for any of
> the other "templates"
>
Isn't this the same as the current setup with the name of the directory
changed and different ant targets to set them up? The new ant target will
setup the default solr instance to be 'extraction' or 'dih' or 'clustering'
and avoid the need to type -Dsolr.solr.home.
>
> Additionally, you could also define -DoutputDir such that it would take and
> copy the whole example directory (including the appropriate type) to some
> output dir. This would allow one to quickly bootstrap a Solr project
> without having to do a lot of schema editing.
>
I like this idea. I have myself needed to do this a couple of times.
--
Regards,
Shalin Shekhar Mangar.
Re: Make ant example faster
Posted by Grant Ingersoll <gs...@apache.org>.
Funny you should mention it, b/c I had an idea the other day of how to
speed all this up, plus will satisfy one of my other annoyances with
the example and make it easier for people to get started (I think).
So, here goes:
Instead of a kitchen-sink example directory, we "revert" it back to
being the tutorial example. It still can get built by ant example,
but ultimately we "deprecate" it (more later).
Then, as a replacement, we create a directory containing what I would
call Solr Templates, which contain subdirectories named appropriately
for the kind of example. Rather than explain, I'll give an example:
The templates directory would contain the configurations (i.e.
schema.xml and solrconfig.xml) and any sample docs (but not the
libraries) for:
tutorial - The current tutorial example
dih - The DIH example
extraction - Solr Cell example
geo - geo spatial example (once 773 is committed)
clustering - once SOLR-769 is committed
simple - A barebones schema and config (mainly used for bootstrapping
a new project for experienced users)
exploratory - Basically, the same as simple, but the schema defines a
single dynamic field - Think of Hoss's Solr Out of the Box talk from
ApacheCon whereby you want to quickly explore a new data set without
having to define a schema.
[other] -
Note, the templates directory could also live under each contrib, but
it isn't necessarily a 1-1 thing (e.g. simple and exploratory
templates are not contrib-specific).
Then, typing "ant example" would copy the necessary tutorial stuff to
the example directory (which still contains the Jetty stuff) but would
not have to recurse into any of the contribs.
Typing "ant example -Dtype=clustering" would copy the clustering
requirements, plus go to contrib/clustering (or whatever) and get the
appropriate material such that the example directory. Similarly for
any of the other "templates"
Additionally, you could also define -DoutputDir such that it would
take and copy the whole example directory (including the appropriate
type) to some output dir. This would allow one to quickly bootstrap a
Solr project without having to do a lot of schema editing.
WDYT?
-Grant
On Apr 13, 2009, at 1:56 PM, Shalin Shekhar Mangar wrote:
> Hello,
>
> As part of SOLR-934, I'd like to setup an example for indexing mail
> boxes
> with the existing example/example-DIH demo. I see that ant example
> has a
> dependency on example-contrib. Do we want to do that? I vaguely
> remember
> Yonik complaining about the time ant example takes.
>
> For setting up the MailEntityProcessor, I'd have to copy mail,
> activation
> and tika jars to example-DIH/solr/mail/lib, which will make it extra
> slow.
> How about we remove the dependency to example-contrib and keep it as
> an
> independent target?
>
> --
> Regards,
> Shalin Shekhar Mangar.