You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Yonik Seeley <ys...@gmail.com> on 2006/01/25 17:44:18 UTC

code contribution

The source code has been uploaded to http://issues.apache.org/jira/browse/SOLR-1
(this was a way to refer to something concrete in the software grant license).

It's not in a production-ready state anymore since I had to rip it out
of it's CNET environment (we have our own package management and deployment
system).  It doesn't even have an ant build file anymore.

The next steps are probably
- get stuff into SVN (should we wait for license to be ack'd first?)
- make a website
- Take a week to make stuff passibly presentable,
  then let lucene community know about Solr.

Some quick notes on the current state of things below:

-Yonik


Code:
  - I changed the packages to org.apache.solr, created some new packages
    and moved some classes around, applied apache license to *java
  - modified things enough to get it to run under Tomcat 5.5
  - Some class names still start with Solar, instead of Solr.
    Should this be changed?

Servlet:
  - Right now it must be ROOT (just due to URLs in the admin page I think).
    I assume people will want this changed (should be /solr I guess)

Admin:
  - use firefox for now to see the admin interface... the stylesheet
    doesn't seem to work on IE
  - some links still point to CNET... remove or find replacements
  - the JSPs are a hack... need to refactor all the repeated code
    at the start of each JSP
  - replace resin specific stuff (like the check for resin.pid, and
    resin-status link)

Config:
  - The default index directory is $CWD/index...
    in CNET installations, this was CWD was always the resin3 install directory.
    In other containers, this can obviously be different.
    In tomcat, if I cd to bin to execute startup.sh, that is where the
    index directory will be created.
  - We used a resin feature to include another directory in the classpath
    and put configuration files there (keeping config separate from the
    solr "binary".  All config files were loaded via classloader.
    If you want to try something right now, copy config files such as
    schema.xml and solarconfig.xml into the WEB-INF/classes so they
    can be found.
  - We had a web.external.xml file that was included via an XML entity
    into the solr web.xml to allow changes and extensions w/o having
    to edit the solr one.  This may have to go.
  - for examples of schema.xml and solarconfig.xml, see src/testapps/SolarTest

Test:
  - A majority of the tests are done via a test program running over a
test file.
    Lines starting with < are considered update commands, and other
lines are considered
    query commands.  output is checked via xpath statements.  See
src/testapps/SolarTest/newtest.txt
  - need something more flexible longer term...
  - move ad-hock tests to the test directory and convert to junit

Documentation:
  - There is a lot of internal documentation but it's intermixed with CNET
    specific specific stuff, and there is way too much of it.
    I'll be starting from pseudo-scratch...  most important stuff is
    quickstart guide,
    query params
    result XML format
    update XML format
    analysis stuff (available tokenizers and filters)

Scripts:
  - These were templatized by our internal package management software.
    In the absense of that, config stuff such as port numbers and machine names
    should be factored out to a single place.

Replication:
  - Uses the strategy outlined by Doug for Technorati, and relies on scripts,
    ssh, and rsync, and only works on UNIX filesystems with hard links.
  - snapshooter script is triggered on a commit (post-commit hook configurable
    via solarconfig.xml

Demo:
  - we need a demo or out-of-the-box schema and configration (and
maybe something that
    adds some data to the index).

Re: code contribution

Posted by Yoav Shapira <yo...@apache.org>.
Howdy,


> > >  It doesn't even have an ant build file anymore.
> >
> > That'd be good to have, and I'm willing to help write it.
>
> Great!  I'm an ant novice (I'm much better at make), so we would
> probably get better results if someone else started it off.  I was
> just going to copy the one from nutch and hack it until it worked.

I've just committed a starting version with the basic functionality.

There are still some cleanup items needed:
- lucene_extras should be named to something clearer, and the one Test
class in it that depends on JUnit should be moved to the test
subfolder
- The XPP jar in the lib directory has an O (the letter oh) where a 0
(the digit zero) would make more sense in its name.  Is the letter
intentional?  I wasn't sure, so I didn't want to change it.
- I added a top-level license file as required by all ASF projects,
but we probably need a NOTICE.txt file as well covering the XPP
license, if we keep using XPP.

> That was my assumption, and I had tried it but it didn't work.  Must
> have been my mistake somewhere.

It may be in the servlet code, I'll take a look when I get a chance.

> Definitely should be configurable.
> I do want some solution so that people can get started quickly
> though... just copy a generic solr.war into webapps, copy over a
> config directory, then start the appserver.
>
> So using cwd as a default would allow people to get up and running w/o
> configuration.  Is there an easier/better way?

Perhaps using the java.io.tmpdir system property or the servlet
container's javax.servlet.context.tempdir context attribute.

> When you do want to specify where config is, and where the index
> directory should be, what's the best way to do this?  Is there any
> other good way rather than adding to web.xml as a context or servlet
> param?

Putting them on the classpath using a classloader reference to look
them up, or using a simple configuration file that's not tied to the
servlet container.  The web.xml approach is fine but it has at least
one serious drawback in that it makes out-of-container testing much
more difficult.

> Hmmm, OK. Must have been my mistake.
> Does the external file have to be in the webapp?  What we had done
> with Resin was access outside it (../../conf/solar/web.external.xml).

I don't think so, but the user account running the server must have
read permissions on the directory.

--
Yoav Shapira
System Design and Management Fellow
MIT Sloan School of Management
Cambridge, MA, USA
yoavs@computer.org / www.yoavshapira.com

Re: code contribution

Posted by Yonik Seeley <ys...@gmail.com>.
On 1/25/06, Yoav Shapira <yo...@apache.org> wrote:
> Hi,
>
> >  It doesn't even have an ant build file anymore.
>
> That'd be good to have, and I'm willing to help write it.

Great!  I'm an ant novice (I'm much better at make), so we would
probably get better results if someone else started it off.  I was
just going to copy the one from nutch and hack it until it worked.

> +1 to solr rather than Solar, for consistency.

I think I got all those cases now.

> > Servlet:
> >   - Right now it must be ROOT (just due to URLs in the admin page I think).
> >     I assume people will want this changed (should be /solr I guess)
>
> Should be relative URLs so it works with any context path.  But not urgent.

That was my assumption, and I had tried it but it didn't work.  Must
have been my mistake somewhere.

> Reliance on the operating system's current working directory is
> dangerous.  It can be used as a default perhaps, but solr should
> provide a mechanism to indicate where the index directory should be.

Definitely should be configurable.
I do want some solution so that people can get started quickly
though... just copy a generic solr.war into webapps, copy over a
config directory, then start the appserver.

So using cwd as a default would allow people to get up and running w/o
configuration.  Is there an easier/better way?

When you do want to specify where config is, and where the index
directory should be, what's the best way to do this?  Is there any
other good way rather than adding to web.xml as a context or servlet
param?


> >   - We had a web.external.xml file that was included via an XML entity
> >     into the solr web.xml to allow changes and extensions w/o having
> >     to edit the solr one.  This may have to go.
>
> It's not necessarily a bad thing, and should work more or less
> universally

Hmmm, OK. Must have been my mistake.
Does the external file have to be in the webapp?  What we had done
with Resin was access outside it (../../conf/solar/web.external.xml).

-Yonik

Re: code contribution

Posted by Yoav Shapira <yo...@apache.org>.
Hi,

>  It doesn't even have an ant build file anymore.

That'd be good to have, and I'm willing to help write it.

>   - I changed the packages to org.apache.solr, created some new packages
>     and moved some classes around, applied apache license to *java
>   - modified things enough to get it to run under Tomcat 5.5
>   - Some class names still start with Solar, instead of Solr.
>     Should this be changed?

+1 for the same reasons as Doug: now's the time to change them.  And +
1 to solr rather than Solar, for consistency.

> Servlet:
>   - Right now it must be ROOT (just due to URLs in the admin page I think).
>     I assume people will want this changed (should be /solr I guess)

Should be relative URLs so it works with any context path.  But not urgent.

>     In tomcat, if I cd to bin to execute startup.sh, that is where the
>     index directory will be created.

Reliance on the operating system's current working directory is
dangerous.  It can be used as a default perhaps, but solr should
provide a mechanism to indicate where the index directory should be. 
Some server admins don't allow the server account write access to the
server's bin directory, and this is actually a good / prudent security
measure.

>   - We had a web.external.xml file that was included via an XML entity
>     into the solr web.xml to allow changes and extensions w/o having
>     to edit the solr one.  This may have to go.

It's not necessarily a bad thing, and should work more or less
universally, so I wouldn't rush to junk it -- there's enough other
work to do ;)

--
Yoav Shapira
System Design and Management Fellow
MIT Sloan School of Management
Cambridge, MA, USA
yoavs@computer.org / www.yoavshapira.com

Re: code contribution

Posted by Doug Cutting <cu...@apache.org>.
Yonik Seeley wrote:
> Code:
>   - I changed the packages to org.apache.solr, created some new packages
>     and moved some classes around, applied apache license to *java
>   - modified things enough to get it to run under Tomcat 5.5
>   - Some class names still start with Solar, instead of Solr.
>     Should this be changed?

+1  It will never be easier to change the names than now, and consistent 
naming is a good thing.

> Servlet:
>   - Right now it must be ROOT (just due to URLs in the admin page I think).
>     I assume people will want this changed (should be /solr I guess)

File a bug report in JIRA for this.  Yes, it's a bug, but low priority 
at this point.  Most of the other stuff you list could also make it into 
bug reports, unless you intend to complete all of these before anyone 
else looks at the code.

I'd focus on documentation and the demo before you make any announcements.

Cheers!

Doug

Re: code contribution

Posted by Ian Holsman <li...@holsman.net>.
Yonik Seeley wrote:
> An actual demo hosted by apache would be cool (and useful for people
> who wanted to get a peek at Solr without having to install it), but
> it's not what I was talking about in my email.

ok.. I'll go check to see if I can get a zone set up.


> As far as dmoz goes, I don't know too much about it, but we want to
> avoid confusing people out about what Solr is (it's not nutch, which is
> the best solution for crawling, indexing, and searching the web).
> 
which brings up a interesting point.
we probably need a FAQ entry on Solr for confused people like me.
basically

what it is
what problem it is trying to solve
and what it isn't.

> -Yonik
> 
> On 1/25/06, Ian Holsman <li...@holsman.net> wrote:
>> Yonik Seeley wrote:
>>
>>> Demo:
>>>   - we need a demo or out-of-the-box schema and configration (and
>>> maybe something that
>>>     adds some data to the index).
>>>
>> maybe we do do a dmoz demo? (we have one internally @cnet already done)
>>
>> We could have it running on a weekly schedule on one of the apache zones.
>>
> 


Re: code contribution

Posted by Yonik Seeley <ys...@gmail.com>.
An actual demo hosted by apache would be cool (and useful for people
who wanted to get a peek at Solr without having to install it), but
it's not what I was talking about in my email.

I was thingking something small that can be checked into SVN as a
regular part of Solr,  so when someone downloads Solr, they can do the
following:
  - build solr
  - deploy solr and the demo schema to tomcat
  - run some kind of command to add data to the index
  - go to the admin interface and execute some queries

The "some kind of command to add data" to the index could just be a
shell script with curl commands (Right now, the only way to update
Solr is by posting an XML document).  I think that would be most
instructive to a new developer.

As far as dmoz goes, I don't know too much about it, but we want to
avoid confusing peopleout about what Solr is (it's not nutch, which is
the best solution for crawling, indexing, and searching the web).

-Yonik

On 1/25/06, Ian Holsman <li...@holsman.net> wrote:
> Yonik Seeley wrote:
>
> > Demo:
> >   - we need a demo or out-of-the-box schema and configration (and
> > maybe something that
> >     adds some data to the index).
> >
>
> maybe we do do a dmoz demo? (we have one internally @cnet already done)
>
> We could have it running on a weekly schedule on one of the apache zones.
>

Re: code contribution

Posted by Ian Holsman <li...@holsman.net>.
Yonik Seeley wrote:

> Demo:
>   - we need a demo or out-of-the-box schema and configration (and
> maybe something that
>     adds some data to the index).
> 

maybe we do do a dmoz demo? (we have one internally @cnet already done)

We could have it running on a weekly schedule on one of the apache zones.