You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@red-bean.com> on 2005/10/30 21:58:07 UTC
Repository Indexer demo (was Re: full text search source code, and change sets)
On Thu, 27 Oct 2005, Marcus Rueckert wrote:
> > sussman mentioned on irc and i think on the mailinglist to that they did
> > a integration of svn and pylucene via a post commit hook script.
OK, I cleaned it up, and the lucene-libsvn_fs demo works. You can
check out the project here:
http://svn.red-bean.com/repos/sussman/software/subversion/ReposIndexer
Here's the README file:
----------
This is a proof-of-concept: it demonstrates how one can hook up a
text-indexing engine with a subversion repository.
Specifically, it connects the 'lupy' module
(http://divmod.org/projects/lupy) -- which is a python port of the
famous Lucene indexer, included in this package -- with calls to the
libsvn_fs python bindings.
---------------------------------------------------------------------
DISCLAIMER: 'lupy' is now retired software. You should be using
'PyLunene' instead, located at http://pylucene.osafoundation.org/.
Rumor is that it's very easy to convert a lupy application into a
PyLucene one via simple search and replace.
---------------------------------------------------------------------
To try this demo:
1. Make sure you have the subversion swig/python bindings installed.
To verify this, enter the python interpreter and check that you
can successfully run the command 'import svn.fs'.
2. Create an index of a single revision (say, revision 1) by running
the 'svn_index.py' script against some repository:
$ ./svn_index.py /path/to/repos 1 myindex
Indexing changed file: (1, /libsvn_delta/xml_parse.c)
...done.
Indexing changed file: (1, /libsvn_delta/delta.h)
...done.
Indexing changed file: (1, /libsvn_delta/path_driver.c)
...done.
[...]
This creates a directory 'myindex' containing indexed data of all
the *.c and *.h changed paths in revision 1. Ideally, we would
want to index files that match other patterns. And also, we'd
probably want to index more than a single revision!
3. Search the index for a term:
$ ./svn_search.py "txdelta" myindex
Found in (1, /libsvn_delta/compose_delta.c).
Notice that each hit comes back as a (revision, path) pair.
That's because the indexing script has declared each "key" to be
of that form.
Presumably, one could develop this demo into a full-fledged
post-commit hook which indexes the changed paths of each newly created
revision, augmenting an ever-growing server-side index. One could
also then write a nice CGI script to search the index.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org