You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Shalin Shekhar Mangar <sh...@gmail.com> on 2009/04/13 19:56:50 UTC

Make ant example faster

Hello,

As part of SOLR-934, I'd like to setup an example for indexing mail boxes
with the existing example/example-DIH demo. I see that ant example has a
dependency on example-contrib. Do we want to do that? I vaguely remember
Yonik complaining about the time ant example takes.

For setting up the MailEntityProcessor, I'd have to copy mail, activation
and tika jars to example-DIH/solr/mail/lib, which will make it extra slow.
How about we remove the dependency to example-contrib and keep it as an
independent target?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Make ant example faster

Posted by Ryan McKinley <ry...@gmail.com>.
On Apr 22, 2009, at 12:20 PM, Erik Hatcher wrote:

> I was aiming simple... like some simple tweaks to  
> SolrResourceLoader, at least a way to allow plugins to all live  
> separately and wired into a single Solr instance without copying  
> files and such.
>
> What would it take to wire in OSGI (I know nothing about it)?
>

 From my brief experience with OSGi, I don't think it is something we  
can easily tack on to our existing structure.  However it is something  
we should definitely consider for 2.0

I think extending SolrResourceLoader is a good option for 1.4

ryan


> 	Erik
>
>
> On Apr 22, 2009, at 12:18 PM, Grant Ingersoll wrote:
>
>> Even better, is probably something like OSGI where we can make sure  
>> that we have some level of isolation between the class loaders so  
>> that we can have different versions of different JARs w/o breaking  
>> the application.  Since it is clear that Solr is entering into a  
>> "contrib" phase, it is only a matter of time before we start having  
>> version clashes between libraries.
>>
>> On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:
>>
>>> Wouldn't one solution to this bundling and aggregating/separating  
>>> of examples and plugins be made a lot less painful if  
>>> SolrResourceLoader could load from a list of directories rather  
>>> than only a single directory?   What are the negatives to adding  
>>> that support?  Let's keep solr.war lean and mean, with all  
>>> extensions simply appended to a list of JAR containing directories?
>>>
>>> I know, we're recreating a container of sorts, but we already got  
>>> SolrResourceLoader, so maybe just some tweaks there can make  
>>> example bundling a lot more pleasurable?
>>>
>>> 	Erik
>>>
>>
>


Re: Make ant example faster

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I was aiming simple... like some simple tweaks to SolrResourceLoader,  
at least a way to allow plugins to all live separately and wired into  
a single Solr instance without copying files and such.

What would it take to wire in OSGI (I know nothing about it)?

	Erik


On Apr 22, 2009, at 12:18 PM, Grant Ingersoll wrote:

> Even better, is probably something like OSGI where we can make sure  
> that we have some level of isolation between the class loaders so  
> that we can have different versions of different JARs w/o breaking  
> the application.  Since it is clear that Solr is entering into a  
> "contrib" phase, it is only a matter of time before we start having  
> version clashes between libraries.
>
> On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:
>
>> Wouldn't one solution to this bundling and aggregating/separating  
>> of examples and plugins be made a lot less painful if  
>> SolrResourceLoader could load from a list of directories rather  
>> than only a single directory?   What are the negatives to adding  
>> that support?  Let's keep solr.war lean and mean, with all  
>> extensions simply appended to a list of JAR containing directories?
>>
>> I know, we're recreating a container of sorts, but we already got  
>> SolrResourceLoader, so maybe just some tweaks there can make  
>> example bundling a lot more pleasurable?
>>
>> 	Erik
>>
>


Re: Make ant example faster

Posted by Grant Ingersoll <gs...@apache.org>.
Even better, is probably something like OSGI where we can make sure  
that we have some level of isolation between the class loaders so that  
we can have different versions of different JARs w/o breaking the  
application.  Since it is clear that Solr is entering into a "contrib"  
phase, it is only a matter of time before we start having version  
clashes between libraries.

On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:

> Wouldn't one solution to this bundling and aggregating/separating of  
> examples and plugins be made a lot less painful if  
> SolrResourceLoader could load from a list of directories rather than  
> only a single directory?   What are the negatives to adding that  
> support?  Let's keep solr.war lean and mean, with all extensions  
> simply appended to a list of JAR containing directories?
>
> I know, we're recreating a container of sorts, but we already got  
> SolrResourceLoader, so maybe just some tweaks there can make example  
> bundling a lot more pleasurable?
>
> 	Erik
>



Re: Make ant example faster

Posted by Chris Hostetter <ho...@fucit.org>.
: Wouldn't one solution to this bundling and aggregating/separating of examples
: and plugins be made a lot less painful if SolrResourceLoader could load from a
: list of directories rather than only a single directory?   What are the

I'm not understanding how that would help the example situation.  what 
are you envisioning that the instanceDir would look like?  how 
would SolrResourceLoader know which directories to use?

right now SolrResourceLoader assumes (instanceDir + "lib/") will contain a 
bunch of jars ... i can imagine that we could let that directory contain 
other directories and walk it recursively looking for jars, and then 
people could put symlinks in it to other lib directories -- but how would 
that help us with the example?  would we create the symlinks via ant? can 
tgz/zip files store symlinks efficiently?

Or are you thinking that we would add a new way to specify additional lib 
dir paths in the solrconfig.xml? ... i suppose that would be posible, but 
i think it would require some funky changes to SolrConfig and Config to 
parse out the lib dirs before parsing anything else (that would need to 
kow about the SolrResourceLoader)
 


-Hoss


Re: Make ant example faster

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Wouldn't one solution to this bundling and aggregating/separating of  
examples and plugins be made a lot less painful if SolrResourceLoader  
could load from a list of directories rather than only a single  
directory?   What are the negatives to adding that support?  Let's  
keep solr.war lean and mean, with all extensions simply appended to a  
list of JAR containing directories?

I know, we're recreating a container of sorts, but we already got  
SolrResourceLoader, so maybe just some tweaks there can make example  
bundling a lot more pleasurable?

	Erik


Re: Make ant example faster

Posted by Chris Hostetter <ho...@fucit.org>.
: > assuming we have more use-case specific examples, wouldn't that just be
: > something that copies one of them to a target directory?
: 
: I guess what I really want is a way to be able to say:  Give me a Solr home
: that has these X features (DIH, Solr Cell, spell checking, highlighting, plus
: whatever libs are needed) with some basic configuration + my choice of a
: schema ranging from one that is barebones (maybe just an "id" field defined)
: to a "full fledged" one (the current example) and I want to be able to do it
: as simple as possible (i.e. as few commands as possible).

Ah.... i'm understanding now.  you don't just want a lot of good 
micro-examples of each feature, you want an easy way to generate "default" 
configs that work for an arbitrary set of features specified by the user.

That seems like a hard problem to get right in a generic way.

The simplest method i can think of for achiving that would be to start 
with a kitchen-sink type example that includes *everything* (because then 
it's easy to test that all of the pieces work well together and don't 
collide -- duplicate fieldnames or hanler names etc...) and then use xml 
comments or some other templating to be able to split that kitchen sink 
file up into snippets -- which could then be combined again in lots of 
combinations.

(Or ... I suppose the snippets could be maintained by hand and then the 
build system could generate the kitchen sink and run tests to ensure that 
none of them collide ... but maintaining the kitchen-sink by hand seems 
easier in a weird way)


-Hoss


Re: Make ant example faster

Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 20, 2009, at 5:45 PM, Chris Hostetter wrote:

>
> : Fair enough.  FWIW, I'd still like to be able to generate a Solr  
> container
> : from an example (i.e. "minimal" or "DIH" or whatever)
>
> by "container" do you mean a Solr home with configs and neccessary  
> libs
> ready to be tweaked to suite your purposes?
>
> assuming we have more use-case specific examples, wouldn't that just  
> be
> something that copies one of them to a target directory?

I guess what I really want is a way to be able to say:  Give me a Solr  
home that has these X features (DIH, Solr Cell, spell checking,  
highlighting, plus whatever libs are needed) with some basic  
configuration + my choice of a schema ranging from one that is  
barebones (maybe just an "id" field defined) to a "full fledged" one  
(the current example) and I want to be able to do it as simple as  
possible (i.e. as few commands as possible).

-Grant

Re: Make ant example faster

Posted by Chris Hostetter <ho...@fucit.org>.
: Fair enough.  FWIW, I'd still like to be able to generate a Solr container
: from an example (i.e. "minimal" or "DIH" or whatever)

by "container" do you mean a Solr home with configs and neccessary libs 
ready to be tweaked to suite your purposes?

assuming we have more use-case specific examples, wouldn't that just be 
something that copies one of them to a target directory?



-Hoss


Re: Make ant example faster

Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 16, 2009, at 7:37 PM, Chris Hostetter wrote:

>
> : It is similar, indeed, but I think it results in there only ever  
> being one
> : active Solr example and the user need not worry about setting solr  
> home.
>
> Hmmm... this seems like a bad idea.
>
> we want to make sure that *users* who have downloaded Solr can run  
> all of
> the examples without needing ant ... having a single "active"  
> example and
> using an ant target to change it would mean that if i install solr and
> then go through the tutorial (using the tutorial example), i would  
> need to
> (understand and run) ant to see the DIH example.
>
>
> It seems like it would make a lot more sense to have lots of  
> examples and
> let the user set the solr home to try them out -- that's very easy  
> to do.
>
> I'm not sure i really understand the concern about how long "ant  
> example"
> takes ... it's a build time task, and it only take ~15 seconds on my  
> box
> if everything is up to date (if everything isn't up todate then
> compilation is going to take much longer then what "example"  
> does) ... the
> longest contributor to the time seems to be contrib/javascript's  
> "docs"
> target -- but i'm guessing some ant tricks to check directory mod  
> times
> before runing the jsrun.jar could shave that off as well.


Fair enough.  FWIW, I'd still like to be able to generate a Solr  
container from an example (i.e. "minimal" or "DIH" or whatever)

Re: Make ant example faster

Posted by Chris Hostetter <ho...@fucit.org>.
: It is similar, indeed, but I think it results in there only ever being one
: active Solr example and the user need not worry about setting solr home.

Hmmm... this seems like a bad idea.

we want to make sure that *users* who have downloaded Solr can run all of 
the examples without needing ant ... having a single "active" example and 
using an ant target to change it would mean that if i install solr and 
then go through the tutorial (using the tutorial example), i would need to 
(understand and run) ant to see the DIH example.

It seems like it would make a lot more sense to have lots of examples and 
let the user set the solr home to try them out -- that's very easy to do.

I'm not sure i really understand the concern about how long "ant example" 
takes ... it's a build time task, and it only take ~15 seconds on my box 
if everything is up to date (if everything isn't up todate then 
compilation is going to take much longer then what "example" does) ... the 
longest contributor to the time seems to be contrib/javascript's "docs" 
target -- but i'm guessing some ant tricks to check directory mod times 
before runing the jsrun.jar could shave that off as well.

-Hoss


Re: Make ant example faster

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Apr 14, 2009 at 4:25 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Apr 13, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:
>
>>
>> Isn't this the same as the current setup with the name of the directory
>> changed and different ant targets to set them up? The new ant target will
>> setup the default solr instance to be 'extraction' or 'dih' or
>> 'clustering'
>> and avoid the need to type -Dsolr.solr.home.
>>
>
>
> It is similar, indeed, but I think it results in there only ever being one
> active Solr example and the user need not worry about setting solr home.
>

+1

Lets do it.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Make ant example faster

Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 13, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:

> On Tue, Apr 14, 2009 at 12:33 AM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>>
>> Instead of a kitchen-sink example directory, we "revert" it back to  
>> being
>> the tutorial example.  It still can get built by ant example, but  
>> ultimately
>> we "deprecate" it (more later).
>>
>> Then, as a replacement, we create a directory containing what I  
>> would call
>> Solr Templates, which contain subdirectories named appropriately  
>> for the
>> kind of example.  Rather than explain, I'll give an example:
>>
>> The templates directory would contain the configurations (i.e.  
>> schema.xml
>> and solrconfig.xml) and any sample docs (but not the libraries) for:
>>       tutorial - The current tutorial example
>>       dih - The DIH example
>>       extraction - Solr Cell example
>>       geo - geo spatial example (once 773 is committed)
>>       clustering - once SOLR-769 is committed
>>       simple - A barebones schema and config (mainly used for
>> bootstrapping a new project for experienced users)
>>       exploratory - Basically, the same as simple, but the schema  
>> defines
>> a single dynamic field -  Think of Hoss's Solr Out of the Box talk  
>> from
>> ApacheCon whereby you want to quickly explore a new data set  
>> without having
>> to define a schema.
>>       [other] -
>>
>> Note, the templates directory could also live under each contrib,  
>> but it
>> isn't necessarily a 1-1 thing (e.g. simple and exploratory  
>> templates are not
>> contrib-specific).
>>
>> Then, typing "ant example" would copy the necessary tutorial stuff  
>> to the
>> example directory (which still contains the Jetty stuff) but would  
>> not have
>> to recurse into any of the contribs.
>>
>> Typing "ant example -Dtype=clustering"  would copy the clustering
>> requirements, plus go to contrib/clustering (or whatever) and get the
>> appropriate material such that the example directory.  Similarly  
>> for any of
>> the other "templates"
>>
>
> Isn't this the same as the current setup with the name of the  
> directory
> changed and different ant targets to set them up? The new ant target  
> will
> setup the default solr instance to be 'extraction' or 'dih' or  
> 'clustering'
> and avoid the need to type -Dsolr.solr.home.


It is similar, indeed, but I think it results in there only ever being  
one active Solr example and the user need not worry about setting solr  
home.

-Grant

Re: Make ant example faster

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Apr 14, 2009 at 12:33 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> Instead of a kitchen-sink example directory, we "revert" it back to being
> the tutorial example.  It still can get built by ant example, but ultimately
> we "deprecate" it (more later).
>
> Then, as a replacement, we create a directory containing what I would call
> Solr Templates, which contain subdirectories named appropriately for the
> kind of example.  Rather than explain, I'll give an example:
>
> The templates directory would contain the configurations (i.e. schema.xml
> and solrconfig.xml) and any sample docs (but not the libraries) for:
>        tutorial - The current tutorial example
>        dih - The DIH example
>        extraction - Solr Cell example
>        geo - geo spatial example (once 773 is committed)
>        clustering - once SOLR-769 is committed
>        simple - A barebones schema and config (mainly used for
> bootstrapping a new project for experienced users)
>        exploratory - Basically, the same as simple, but the schema defines
> a single dynamic field -  Think of Hoss's Solr Out of the Box talk from
> ApacheCon whereby you want to quickly explore a new data set without having
> to define a schema.
>        [other] -
>
> Note, the templates directory could also live under each contrib, but it
> isn't necessarily a 1-1 thing (e.g. simple and exploratory templates are not
> contrib-specific).
>
> Then, typing "ant example" would copy the necessary tutorial stuff to the
> example directory (which still contains the Jetty stuff) but would not have
> to recurse into any of the contribs.
>
> Typing "ant example -Dtype=clustering"  would copy the clustering
> requirements, plus go to contrib/clustering (or whatever) and get the
> appropriate material such that the example directory.  Similarly for any of
> the other "templates"
>

Isn't this the same as the current setup with the name of the directory
changed and different ant targets to set them up? The new ant target will
setup the default solr instance to be 'extraction' or 'dih' or 'clustering'
and avoid the need to type -Dsolr.solr.home.


>
> Additionally, you could also define -DoutputDir such that it would take and
> copy the whole example directory (including the appropriate type) to some
> output dir.  This would allow one to quickly bootstrap a Solr project
> without having to do a lot of schema editing.
>

I like this idea. I have myself needed to do this a couple of times.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Make ant example faster

Posted by Grant Ingersoll <gs...@apache.org>.
Funny you should mention it, b/c I had an idea the other day of how to  
speed all this up, plus will satisfy one of my other annoyances with  
the example and make it easier for people to get started (I think).   
So, here goes:

Instead of a kitchen-sink example directory, we "revert" it back to  
being the tutorial example.  It still can get built by ant example,  
but ultimately we "deprecate" it (more later).

Then, as a replacement, we create a directory containing what I would  
call Solr Templates, which contain subdirectories named appropriately  
for the kind of example.  Rather than explain, I'll give an example:

The templates directory would contain the configurations (i.e.  
schema.xml and solrconfig.xml) and any sample docs (but not the  
libraries) for:
	tutorial - The current tutorial example
	dih - The DIH example
	extraction - Solr Cell example
	geo - geo spatial example (once 773 is committed)
	clustering - once SOLR-769 is committed
	simple - A barebones schema and config (mainly used for bootstrapping  
a new project for experienced users)
	exploratory - Basically, the same as simple, but the schema defines a  
single dynamic field -  Think of Hoss's Solr Out of the Box talk from  
ApacheCon whereby you want to quickly explore a new data set without  
having to define a schema.
	[other] -

Note, the templates directory could also live under each contrib, but  
it isn't necessarily a 1-1 thing (e.g. simple and exploratory  
templates are not contrib-specific).

Then, typing "ant example" would copy the necessary tutorial stuff to  
the example directory (which still contains the Jetty stuff) but would  
not have to recurse into any of the contribs.

Typing "ant example -Dtype=clustering"  would copy the clustering  
requirements, plus go to contrib/clustering (or whatever) and get the  
appropriate material such that the example directory.  Similarly for  
any of the other "templates"

Additionally, you could also define -DoutputDir such that it would  
take and copy the whole example directory (including the appropriate  
type) to some output dir.  This would allow one to quickly bootstrap a  
Solr project without having to do a lot of schema editing.

WDYT?

-Grant

	


On Apr 13, 2009, at 1:56 PM, Shalin Shekhar Mangar wrote:

> Hello,
>
> As part of SOLR-934, I'd like to setup an example for indexing mail  
> boxes
> with the existing example/example-DIH demo. I see that ant example  
> has a
> dependency on example-contrib. Do we want to do that? I vaguely  
> remember
> Yonik complaining about the time ant example takes.
>
> For setting up the MailEntityProcessor, I'd have to copy mail,  
> activation
> and tika jars to example-DIH/solr/mail/lib, which will make it extra  
> slow.
> How about we remove the dependency to example-contrib and keep it as  
> an
> independent target?
>
> -- 
> Regards,
> Shalin Shekhar Mangar.